Producing datasets for representing terms and objects based on automated learning from text contents
First Claim
1. A computer system for producing a dataset for representing a term or information related to an object, the system comprising:
- one or more processors operable to;
receive a first group of text contents comprising a plurality of text units;
receive, or identify from the text contents, a first term comprising a word or a phrase;
identify one of the plurality of text units comprising a sentence or a phrase containing the first term and a plurality of second terms each comprising a word or a phrase;
identify a relation between the first term and the plurality of second terms in the one of the plurality of text units using a machine-based algorithm based on a distance between the first term and one or more second terms in the one of the plurality of text units, wherein the distance is defined as a number of terms between the first term and one or more second terms in the one of the plurality of text units, or based on a presence or absence of a third term in the one of the plurality of text units, or based on a semantic attribute-associated with the first term in the one of the plurality of text units, wherein the semantic attribute includes a semantic role, a semantic attribute type or attribute value, or a meaning of the first term or the one or more second terms;
generate a first score for at least one of the second terms based on the relation;
select one or more of the second terms based on the first score as terms associated with the first term; and
associate the selected terms to the first term to form a dataset.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and methods for creating data objects as symbolic or associative representations of terms or objects using machine-based methods are presented. A term can be a word or a phrase, which can also be the name of an object. For a given term, the methods analyze other terms associated with the term, and determine a set of terms or values to be attached to the term to form a dataset, either as a representation of the term, or as information about an object represented by the term, including various properties associated with the object. The methods include obtaining a group of text contents or non-natural language data contents, specifying a target term or symbol, and identifying contextual attributes of the target term or symbol. The contextual attributes include positional and distance attributes, as well as grammatical and semantic attributes.
55 Citations
19 Claims
-
1. A computer system for producing a dataset for representing a term or information related to an object, the system comprising:
-
one or more processors operable to; receive a first group of text contents comprising a plurality of text units; receive, or identify from the text contents, a first term comprising a word or a phrase; identify one of the plurality of text units comprising a sentence or a phrase containing the first term and a plurality of second terms each comprising a word or a phrase; identify a relation between the first term and the plurality of second terms in the one of the plurality of text units using a machine-based algorithm based on a distance between the first term and one or more second terms in the one of the plurality of text units, wherein the distance is defined as a number of terms between the first term and one or more second terms in the one of the plurality of text units, or based on a presence or absence of a third term in the one of the plurality of text units, or based on a semantic attribute-associated with the first term in the one of the plurality of text units, wherein the semantic attribute includes a semantic role, a semantic attribute type or attribute value, or a meaning of the first term or the one or more second terms; generate a first score for at least one of the second terms based on the relation; select one or more of the second terms based on the first score as terms associated with the first term; and associate the selected terms to the first term to form a dataset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer system for producing a dataset for representing a term or an object, the system comprising:
-
one or more processors operable to; receive a first group of text contents comprising a plurality of text units; receive, or identify from the text contents, a first term comprising a word or a phrase; identify one of the plurality of text units comprising a sentence or a phrase containing the first term and one or more second terms each comprising a word or a phrase; identify a relation between the first term and the one or more second terms in the one of the plurality text units using a machine-based algorithm based on a distance between the first term and one or more second terms in the one of the plurality of text units, wherein the distance is defined as a number of terms between the first term and one or more second terms in the one of the plurality of text units, or based on a presence or absence of a third term in the one of the plurality of text units, or based on a semantic attribute-associated with the first term in the one of the plurality of text units, wherein the semantic attribute includes a semantic role, a semantic attribute type or attribute value, or a meaning of the first term or the one or more second terms; determine one or more numerical values to represent the relation or a strength of the relation between the first term and the one or more second terms; collect one or more of the one or more numerical values into a group of numerical values; associate the group of numerical values to the first term to form a dataset; output the dataset, wherein the dataset is used for providing a representation of the first term or an object represented by the first term based on the relation between the first term and other terms other than the first term. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
-
Specification