Probabilistic tree-structured learning system for extracting contact data from quotes
First Claim
1. A method for creating or updating a data set stored as a record in a database, wherein a plurality of data sets are stored in the database, wherein each data set in the plurality of data sets is defined to include a plurality of fields corresponding to a plurality of predefined entities, the method comprising:
- searching through a plurality of documents for current information about the data set;
upon locating a search result document, in the plurality of documents, containing the current information about the data set, copying and storing a data string having a plurality of tokens from content of the search result document containing the current information about the data set;
extracting a sequence of tokens corresponding to the data string;
recognizing a first set of tokens in the sequence of tokens as a first entity based on entity recognition probabilistic scoring derived from a machine evaluation of a training set of entities;
recognizing a second set of tokens in the sequence of tokens as a second entity based on identifying the first entity as a first node in a tree-like structure and identifying the second entity as by a second node in the tree-like structure, the first node connected to the second node by an arc representing a probability that the first entity is followed by the second entity in a probable entity sequence, the first node connected to another node by another arc representing another probability that the first entity is followed by another entity in another probable entity sequence, the tree-like structure created by a machine evaluation of a training set of input strings;
aligning one or more tokens of the first set of tokens as one of a plurality of probable entities using the probabilistic scoring of the first set of tokens and grammatical rules;
assigning the aligned one or more tokens to one entity field of the plurality of predefined entity fields of the data set; and
creating and storing a new record for the data set if none exists, or updating an existing record for the data set, using the assigned aligned one or more tokens.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for updating data stored in a database, such as contact information. An input string is obtained through a search for timely material associated with the stored contact. The input string is parsed using probabilistic tendencies to extract entities corresponding to those stored with the contact. Secondary entities are used to assist in the identification of the primary entities. The contact is then updated (or added if new) using the extracted primary entities.
-
Citations
19 Claims
-
1. A method for creating or updating a data set stored as a record in a database, wherein a plurality of data sets are stored in the database, wherein each data set in the plurality of data sets is defined to include a plurality of fields corresponding to a plurality of predefined entities, the method comprising:
-
searching through a plurality of documents for current information about the data set; upon locating a search result document, in the plurality of documents, containing the current information about the data set, copying and storing a data string having a plurality of tokens from content of the search result document containing the current information about the data set; extracting a sequence of tokens corresponding to the data string; recognizing a first set of tokens in the sequence of tokens as a first entity based on entity recognition probabilistic scoring derived from a machine evaluation of a training set of entities; recognizing a second set of tokens in the sequence of tokens as a second entity based on identifying the first entity as a first node in a tree-like structure and identifying the second entity as by a second node in the tree-like structure, the first node connected to the second node by an arc representing a probability that the first entity is followed by the second entity in a probable entity sequence, the first node connected to another node by another arc representing another probability that the first entity is followed by another entity in another probable entity sequence, the tree-like structure created by a machine evaluation of a training set of input strings; aligning one or more tokens of the first set of tokens as one of a plurality of probable entities using the probabilistic scoring of the first set of tokens and grammatical rules; assigning the aligned one or more tokens to one entity field of the plurality of predefined entity fields of the data set; and creating and storing a new record for the data set if none exists, or updating an existing record for the data set, using the assigned aligned one or more tokens. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory machine-readable medium carrying one or more sequences of instructions for updating information associated with a contact stored in a multi-tenant database system, which instructions, when executed by one or more processors, cause the one or more processors to:
-
obtain and store a data string having a plurality of tokens in content of a search result from a search for quoted material associated with the contact; extract a sequence of tokens corresponding to the data string; recognize a first set of tokens in the sequence of tokens as a first entity based on entity recognition probabilistic scoring derived from machine evaluation of a training set of entities; recognize a second set of tokens in the sequence of tokens as a second entity based on identifying the first entity as a first node in a tree-like structure and identifying the second entity as by a second node in the tree-like structure, the first node connected to the second node by an arc representing a probability that the first entity is followed by the second entity in a probable entity sequence, the first node connected to another node by another arc representing another probability that the first entity is followed by another entity in another probable entity sequence, the tree-like structure created by a machine evaluation of a training set of input strings; align one or more tokens of the first set of tokens as one of a plurality of probable entities using the probabilistic scoring of the first set of tokens and grammatical rules; assign the aligned one or more tokens to one entity field of corresponding predefined entity fields of the contact based on the probabilistic scoring and the linguistic cues of the probable secondary entities; and create and store a new record for the contact if none exists, or update an existing record for the contact, using the assigned aligned one or more tokens. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. An apparatus for extracting contact data from quotes, wherein a plurality of contacts are stored in a multi-tenant database, the apparatus comprising:
-
a processor; and one or more stored sequences of instructions which, when executed by the processor, cause the processor to; obtain and store a data string having a plurality of tokens in content of a search result from a search for quoted material associated with a contact; extract a sequence of tokens corresponding to the data string; recognize a first set of tokens in the sequence of tokens as a first entity based on entity recognition probabilistic scoring derived from a machine evaluation of a training set of entities; recognize a second set of tokens in the sequence of tokens as a second entity based on identifying the first entity as a first node in a tree-like structure and identifying the second entity as by a second node in the tree-like structure, the first node connected to the second node by an arc representing a probability that the first entity is followed by the second entity in a probable entity sequence, the first node connected to another node by another arc representing another probability that the first entity is followed by another entity in another probable entity sequence, the tree-like structure created by a machine evaluation of a training set of input strings; align one or more tokens of the first set of tokens as one of a plurality of probable entities using the probabilistic scoring of the first set of tokens and grammatical rules; assign the aligned one or more tokens to one entity field of corresponding predefined entity fields of the contact based on the probabilistic scoring and the linguistic cues of the probable secondary entities; and create and store a new record for the contact if none exists, or updating an existing record for the contact, using the assigned aligned one or more tokens. - View Dependent Claims (17)
-
-
18. A method for transmitting code for extracting contact data from quotes in a multi-tenant database system on a transmission medium, the method comprising:
-
transmitting code to obtain and store a data string having a plurality of tokens in content of a search result from a search for quoted material associated with a contact; transmitting code to extract a sequence of tokens corresponding to the data string; transmitting code to recognize a first set of tokens in the sequence of tokens as a first entity based on entity recognition probabilistic scoring derived from a machine evaluation of a training set of entities; transmitting code to recognize a second set of tokens in the sequence of tokens as a second entity based on identifying the first entity as a first node in a tree-like structure and identifying the second entity as by a second node in the tree-like structure, the first node connected to the second node by an arc representing a probability that the first entity is followed by the second entity in a probable entity sequence, the first node connected to another node by another arc representing another probability that the first entity is followed b another entity in another probable entity sequence, the tree-like structure created by a machine evaluation of a training set of input strings; transmitting code to align one or more tokens of the first set of tokens as one of a plurality of probable entities using the probabilistic scoring of the first set of tokens and grammatical rules; transmitting code to assign the aligned one or more tokens to one entity field of the plurality of predetermined entity fields of the data set; and transmitting code to create and store a new record for the data set if none exists, or updating an existing record for the data set, using the assigned aligned one or more tokens. - View Dependent Claims (19)
-
Specification