Description
The files in the input directory are not a supported type.
Solution
Use one of the following supported formats:
- A feature class or table containing a text field with the input text for the model and the labelled entities where the selected text field will be used as input text for the model. The remaining fields will be treated as named entities labels.
- A folder containing training data in the form of standard datasets for NER tasks. The training data must be in .json or .csv files. The file format determines the dataset type of the input.
- When the input is a folder, the following dataset types are supported:
- ner_json—The training data folder should contain a .json file with text and the labelled entities formatted using the spaCy JSON training format.
- IOB—The IOB (I - inside, O - outside, B - beginning tags) format described in Text Chunking using Transformation-Based Learning.
The training data folder should contain the following two .csv files:
- tokens.csv—Contains text as input chunks
- tags.csv—Contains IOB tags for the text chunks
- BILUO—An extension of the IOB format that additionally contains L - last and U - unit tags.
The training data folder should contain the following two .csv files:
- tokens.csv—Contains text as input chunks
- tags.csv—Contains BILUO tags for the text chunks
- When the input is a folder, the following dataset types are supported:
For more information about these formats and labelling data in these formats, see the Labelling text using Doccano guide.