×

Probabilistic tree-structured learning system for extracting contact data from quotes

  • US 9,619,534 B2
  • Filed: 02/24/2011
  • Issued: 04/11/2017
  • Est. Priority Date: 09/10/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method for creating or updating a data set stored as a record in a database, wherein a plurality of data sets are stored in the database, wherein each data set in the plurality of data sets is defined to include a plurality of fields corresponding to a plurality of predefined entities, the method comprising:

  • searching through a plurality of documents for current information about the data set;

    upon locating a search result document, in the plurality of documents, containing the current information about the data set, copying and storing a data string having a plurality of tokens from content of the search result document containing the current information about the data set;

    extracting a sequence of tokens corresponding to the data string;

    recognizing a first set of tokens in the sequence of tokens as a first entity based on entity recognition probabilistic scoring derived from a machine evaluation of a training set of entities;

    recognizing a second set of tokens in the sequence of tokens as a second entity based on identifying the first entity as a first node in a tree-like structure and identifying the second entity as by a second node in the tree-like structure, the first node connected to the second node by an arc representing a probability that the first entity is followed by the second entity in a probable entity sequence, the first node connected to another node by another arc representing another probability that the first entity is followed by another entity in another probable entity sequence, the tree-like structure created by a machine evaluation of a training set of input strings;

    aligning one or more tokens of the first set of tokens as one of a plurality of probable entities using the probabilistic scoring of the first set of tokens and grammatical rules;

    assigning the aligned one or more tokens to one entity field of the plurality of predefined entity fields of the data set; and

    creating and storing a new record for the data set if none exists, or updating an existing record for the data set, using the assigned aligned one or more tokens.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×