Train K-Nearest Neighbor Classifier (Spatial Analyst)

Available with Image Analyst license.

Available with Spatial Analyst license.

Summary

Generates an Esri classifier definition file (.ecd) using the K-Nearest Neighbor classification method.

The K-Nearest Neighbor classifier is a nonparametric classification method that classifies a pixel or segment by a plurality vote of its neighbors. K is the defined number of neighbors used in voting.

Usage

  • The tool assigns training samples to their respective classes. The class of the input pixel is determined by a plurality vote of its K-nearest neighbors.

  • Any Esri-supported raster is accepted as input, including raster products, segmented rasters, mosaics, image services, and generic raster datasets. Segmented rasters must be 8-bit rasters with three bands.

  • The output of this tool is an .ecd file that is used to classify new rasters in the Classify Raster tool. The Classify Raster tool then calculates the distance from each input pixel or segment to all training samples.

    The training sample data must have been collected at multiple times using the Training Samples Manager. The dimension value for each sample is listed in a field in the training sample feature class, which is specified in the Dimension Value Field parameter.

  • To create the training sample file, use the Training Samples Manager pane from the Classification Tools drop-down menu.

  • For segmented rasters that have their key property set to Segmented, the tool computes the index image and associated segment attributes from the RGB segmented raster. The attributes are computed to generate the classifier definition file to be used in a separate classification tool. The attributes for each segment can be computed from any Esri-supported image.

  • The Segment Attributes parameter is only active if one of the raster layer inputs is a segmented image.

Parameters

LabelExplanationData Type
Input Raster

The raster dataset to classify.

The single band raster or segmented raster, multiband raster, or a multidimensional raster to be classified.

Mosaic Layer; Raster Layer; Image Service; String
Input Training Sample File

The training sample file or layer that delineates the training sites.

These can be either shapefiles or feature classes that contain the training samples. The following field names are required in the training sample file:

  • classname—A text field indicating the name of the class category
  • classvalue—A long integer field containing the integer value for each class category

Feature Layer
Output Classifier Definition File

A JSON formatted .ecd file that contains attribute information, statistics, or other information for the classifier.

File
Additional Input Raster
(Optional)

Ancillary raster datasets, such as a multispectral image or a DEM, will be incorporated to generate attributes and other required information for classification. This parameter is optional.

Raster Layer; Mosaic Layer; Image Service; String
K Nearest Neighbors
(Optional)

The number of neighbors that will be used in searching for each input pixel or segment. Increasing the number of neighbors will decrease the influence of individual neighbors on the outcome of the classification. The default value is 1.

Long
Max Number of Samples Per Class
(Optional)

The maximum number of training samples that will be used for each class. The default value of 1000 is recommended when the inputs are nonsegmented rasters. A value that is less than or equal to 0 means that the system will use all the samples from the training sites to train the classifier.

Long
Segment Attributes
(Optional)

Specifies the attributes that will be included in the attribute table associated with the output raster.

This parameter is only active if the Segmented key property is set to true on the input raster. If the only input to the tool is a segmented image, the default attributes are Converged color, Count of pixels, Compactness, and Rectangularity. If an Additional Input Raster value is included as an input with a segmented image, Mean digital number and Standard deviation are also available attributes.

  • Converged colorThe RGB color values will be derived from the input raster on a per-segment basis. This is also known as average chromaticity color.
  • Mean digital numberThe average digital number (DN) will be derived from the optional pixel image on a per-segment basis.
  • Standard deviationThe standard deviation will be derived from the optional pixel image on a per-segment basis.
  • Count of pixelsThe number of pixels composing the segment, on a per-segment basis.
  • CompactnessThe degree to which a segment is compact or circular, on a per-segment basis. The values range from 0 to 1, in which 1 is a circle.
  • RectangularityThe degree to which the segment is rectangular, on a per-segment basis. The values range from 0 to 1, in which 1 is a rectangle.
String
Dimension Value Field
(Optional)

Contains dimension values in the input training sample feature class.

This parameter is required to classify a time series of raster data using the change analysis raster output from the Analyze Changes Using CCDC tool in the Image Analyst toolbox.

Field

TrainKNearestNeighborClassifier(in_raster, in_training_features, out_classifier_definition, {in_additional_raster}, {kNN}, {max_samples_per_class}, {used_attributes}, {dimension_value_field})
NameExplanationData Type
in_raster

The raster dataset to classify.

The single band raster or segmented raster, multiband raster, or a multidimensional raster to be classified.

Mosaic Layer; Raster Layer; Image Service; String
in_training_features

The training sample file or layer that delineates the training sites.

These can be either shapefiles or feature classes that contain the training samples. The following field names are required in the training sample file:

  • classname—A text field indicating the name of the class category
  • classvalue—A long integer field containing the integer value for each class category

Feature Layer
out_classifier_definition

A JSON formatted .ecd file that contains attribute information, statistics, or other information for the classifier.

File
in_additional_raster
(Optional)

Ancillary raster datasets, such as a multispectral image or a DEM, will be incorporated to generate attributes and other required information for classification. This parameter is optional.

Raster Layer; Mosaic Layer; Image Service; String
kNN
(Optional)

The number of neighbors that will be used in searching for each input pixel or segment. Increasing the number of neighbors will decrease the influence of individual neighbors on the outcome of the classification. The default value is 1.

Long
max_samples_per_class
(Optional)

The maximum number of training samples that will be used for each class. The default value of 1000 is recommended when the inputs are nonsegmented rasters. A value that is less than or equal to 0 means that the system will use all the samples from the training sites to train the classifier.

Long
used_attributes
[used_attributes;used_attributes,...]
(Optional)

Specifies the attributes that will be included in the attribute table associated with the output raster.

  • COLORThe RGB color values will be derived from the input raster on a per-segment basis. This is also known as average chromaticity color.
  • MEANThe average digital number (DN) will be derived from the optional pixel image on a per-segment basis.
  • STDThe standard deviation will be derived from the optional pixel image on a per-segment basis.
  • COUNTThe number of pixels composing the segment, on a per-segment basis.
  • COMPACTNESSThe degree to which a segment is compact or circular, on a per-segment basis. The values range from 0 to 1, in which 1 is a circle.
  • RECTANGULARITYThe degree to which the segment is rectangular, on a per-segment basis. The values range from 0 to 1, in which 1 is a rectangle.

This parameter is only enabled if the Segmented key property is set to true on the input raster. If the only input to the tool is a segmented image, the default attributes are COLOR, COUNT, COMPACTNESS, and RECTANGULARITY. If an in_additional_raster value is included as an input with a segmented image, MEAN and STD are also available attributes.

String
dimension_value_field
(Optional)

Contains dimension values in the input training sample feature class.

This parameter is required to classify a time series of raster data using the change analysis raster output from the Analyze Changes Using CCDC tool in the Image Analyst toolbox.

Field

Code sample

TrainKNearestNeighborClassifier example 1 (Python window)

This is a Python sample for the TrainKNearestNeighborClassifier function.

# Import system modules 
import arcpy 
from arcpy.sa import * 
 
# Check out the ArcGIS Spatial Analyst extension license 
arcpy.CheckOutExtension("Spatial") 
 
# Execute  
arcpy.sa.TrainKNearestNeighborClassifier("landsat.tif", "training_sample.shp", r"c:\data\trained_knn.ecd", 5, "COLOR;MEAN;STD;COUNT;COMPACTNESS;RECTANGULARITY")
TrainKNearestNeighborClassifier example 2 (stand-alone script)

This is a Python script sample for the TrainKNearestNeighborClassifier function.

# Import system modules 
import arcpy 
from arcpy.sa import * 
 
# Check out the ArcGIS Spatial Analyst extension license 
arcpy.CheckOutExtension("Spatial") 
 
# Define input parameters 
in_raster = r"C:/Data/landsat.tif" 
in_training_features = r"C:/Data/training_sample.shp" 
out_classifier_definition = r"C:/Data/trained_knn.ecd" 
number_of_neighbors = 5
attributes = "COLOR;MEAN;STD;COUNT;COMPACTNESS;RECTANGULARITY"
     
# Execute  - train K-Nearest Neighbor Classifier
arcpy.sa.TrainKNearestNeighborClassifier(in_raster, in_training_features, 
                                         out_classifier_definition, 
                                         number_of_neighbors, attributes)

Related topics