Leveraging corporal data for data parsing and predicting
First Claim
Patent Images
1. A method comprising:
- obtaining an unstructured token, the unstructured token being unassociated with a class identifier, attribute identifier, or attribute value;
parsing the unstructured token, the parsing including associating, based at least in part on a probabilistic database derived from a corpus, a class identifier with the unstructured token to obtain a classified token;
predicting that an attribute label or an attribute value is associated with the classified token based at least in part on another probabilistic database derived from the corpus and the class identifier; and
associating the attribute label or the attribute value with the classified token.
1 Assignment
0 Petitions
Accused Products
Abstract
The techniques discussed herein leverage structure within data of a corpus to parse unstructured data to obtain structured data and/or to predict latent data that is related to the unstructured and/or structured data. In some examples, parsing and/or predicting can be conducted at varying levels of granularity. In some examples, parsing and/or predicting can be iteratively conducted to improve accuracy and/or to expose more hidden data.
-
Citations
20 Claims
-
1. A method comprising:
-
obtaining an unstructured token, the unstructured token being unassociated with a class identifier, attribute identifier, or attribute value; parsing the unstructured token, the parsing including associating, based at least in part on a probabilistic database derived from a corpus, a class identifier with the unstructured token to obtain a classified token; predicting that an attribute label or an attribute value is associated with the classified token based at least in part on another probabilistic database derived from the corpus and the class identifier; and associating the attribute label or the attribute value with the classified token. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system comprising:
-
one or more processors; computer-readable media having stored thereon computer-executable instructions that, when executed by the one or more processors, configure the system to perform operations comprising; obtaining a token-of-interest (“
TOI”
);generating a schema for the TOI based at least in part on relational data of a corpus, the schema including; a first relation between the TOI and a parsed label classifying the TOI, a second relation between the parsed label and a predicted label, and a third relation between the predicted label and a predicted token, the predicted label classifying the predicted token and the predicted token including latent data associated with the TOI. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A method comprising:
-
tokenizing unstructured data to obtain tokens, the unstructured data lacking relational structure the unstructured data and identifiers of what the data is or what the data is like; parsing the tokens, based at least in part on probabilities calculated from a corpus, to form structured data; predicting, based at least in part on one or more of the tokens and probabilities calculated from a corpus, additional tokens or additional structural information, the additional structural information including an attribute identifier and the additional tokens including an attribute value; and adding the additional tokens or the additional structural information to the structured data. - View Dependent Claims (18, 19, 20)
-
Specification