×

Method and system for automating training of named entity recognition in natural language processing

  • US 10,558,754 B2
  • Filed: 03/29/2017
  • Issued: 02/11/2020
  • Est. Priority Date: 09/15/2016
  • Status: Active Grant
First Claim
Patent Images

1. A method to automate training named entity recognition in natural language processing to build configurable entity definitions, the method comprising:

  • receiving at least one input document or one or more entities through an administration module;

    defining a domain for each of the received entities or the at least one input document through the administration module;

    determining one or more entities corresponding to a domain specific entity in the at least one input document;

    generating a training file;

    via the training file, picking a right parser;

    via the training file, extracting content from the input document;

    via the training file, labeling entity ambiguity, whereby a single training file is used to pick the right parser, extract content from the input document, and label entity ambiguity;

    collecting and maintaining, through a knowledge engine, at least one user action in a knowledge repository, wherein the collecting comprises resolution of the entity ambiguity and comprises;

    displaying a plurality of confirmation blocks containing excerpts appearing in the input document, wherein the excerpts contain an unclassified, ambiguous named entity associated with the entity ambiguity and surrounding text as the surrounding text appears in the input document, wherein the unclassified, ambiguous named entity is ambiguous because its domain overlaps with more than one domain,displaying a proposed specific domain for the excerpts, wherein a single proposed specific domain is displayed for more than one of the excerpts,within the confirmation blocks, displaying user interface elements for confirmation or rejection of a given excerpt out of the excerpts as belonging to the single proposed specific domain,receiving activation of one user interface element of the user interface elements, thereby resolving the entity ambiguity by indicating that text in the given excerpt out of the excerpts does or does not belong to the single proposed specific domain, andupdating the knowledge engine with the resolved entity ambiguity;

    predicting, through the knowledge engine, one or more labelled ambiguous entities;

    fetching, through a training pipeline execution engine, data stored on a document store; and

    associating, through the training pipeline execution engine, each entity with one or more documents based on the fetched data from the document store to build configurable entity definitions;

    wherein the act of generating the training file comprises;

    extracting text from the input document;

    determining a definition of the extracted text to be ambiguous or unambiguous; and

    based on whether the definition of the extracted text is determined to be ambiguous or unambiguous, switching between (a) and (b);

    (a) adding the extracted text to the training file when the definition is determined to be unambiguous, and(b) prompting a user to resolve ambiguity, and adding the resolution of the ambiguity to the training file when the definition is determined to be ambiguous.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×