Annotator

It can be useful to annotate a document. You could use annotation to:

  • Create training data
    The system can be e
    asily trained by inputting sample documents to help the Annotator learn to identify documents and extract data automatically
    • Create an information extraction
      The Annotator can be successfully used to extract data from unstructured as well as structured documents
      • Export results to the document management system


      Getting started

      Before starting an Annotator project, choose a document from a folder from a List of Projects page.

      Creating a project

      To create an Annotator project, upload a document to the List of Projects page and select your document file.

      Opening a project

      To open an existing Annotator project, open the List of Projects page, select Project, and then select your document file.



      Basic Annotator navigation
      Moving between pages

      Press the Up Arrow to move to the previous page and the Down Arrow to move to the next page, or use a mouse scroll.

      Zooming a page

      To zoom, use the zoom controls on the corpus toolbar.

      Searching words

      To search words within a document, click Search on the corpus toolbar.

      Canceling an action

      To undo an action press Ctrl+Z or click Undo on the toolbar.
      To redo something you've undone, press Ctrl+Y (or Ctrl+Shift+Z) or click Redo on the toolbar (the Redo button only appears after you've undone an action).




      The Annotator modes

      The Annotator has two modes:

      Annotation
      Annotation mode is initially enabled. In this mode you can:
      • Identify documents
      • Extract any data from a document
      • Verify data
      • Edit field value

      Constructor (editing)
      To switch to constructor mode, click Edit on the toolbar. In the constructor mode you can make the necessary changes with blocks and fields:
      • Сhange the name of the block
      • Add new block
      • Delete block
      • Add or delete a field


      The Annotator toolbar
      There are three stages of document markup status:
        markup in progress
        markup completed
        markup approved
        To change markup status, use matching buttons on the toolbar.

        To split the document, click Split on the toolbar.
        To see all settings, click Settings on the toolbar.



        The Annotation process

        1. Specify the correct type of document on the toolbar
        2. Specify the recognition area of the text on the document page and select the corresponding field on the block
        3. Recognize and verify the extracted data



        The Annotator structure
        Each block contains fields that describe the properties of the document itself (field value). Each main field has a field type, that specifies the type of data that the area of the text on the document page contains.

        Blocks correspond to text fields on the document from which data must be captured.

        To create a new block, click a Plus on the toolbar of the block.
        There are the following types of blocks:
        List is used to extract text data
        Table is used to extract data from tables
        The blocks for which no text area is specified, have an empty space.

        There are the following types of fields:
        Text is used to extract text data
        List is used to extract text data and contains a list of options. It can also contain tips that help you quickly find the necessary variant of the field value
        Image is used to process objects which were not identified as text during pre-recognition / Image is used to process images
        To delete the field value, click on the current field (or the current area of the text on the document) and then click Delete (or Backspace).

        To edit the field value, click Edit or double-click the field.

        To verify (confirm) the field value, click Checkmark.



        The Annotation Indicators

        Indicators help you understand what changes should be made to a document.

        Annotator includes a different-color highlight markup tool:

        Edit by a robot. Validation failed
        Edit by a human. Validation failed
        Edit by a robot. Validation passed
        Edit by a human. Validation passed
        Edit by a robot. Possibly wrong value
        Edit by a human. Possibly wrong value
        If validation is possibly wrong (or a word is underlined in red), it means there are errors in the field value (e.g. full name is not in the database).