Producing datasets for representing terms and objects based on automated learning from text contents

US 9,880,998 B1
Filed: 11/22/2015
Issued: 01/30/2018
Est. Priority Date: 08/11/2012
Status: Expired due to Fees

First Claim

Patent Images

1. A computer system for producing a dataset for representing a term or information related to an object, the system comprising:

one or more processors operable to;

receive a first group of text contents comprising a plurality of text units;

receive, or identify from the text contents, a first term comprising a word or a phrase;

identify one of the plurality of text units comprising a sentence or a phrase containing the first term and a plurality of second terms each comprising a word or a phrase;

identify a relation between the first term and the plurality of second terms in the one of the plurality of text units using a machine-based algorithm based on a distance between the first term and one or more second terms in the one of the plurality of text units, wherein the distance is defined as a number of terms between the first term and one or more second terms in the one of the plurality of text units, or based on a presence or absence of a third term in the one of the plurality of text units, or based on a semantic attribute-associated with the first term in the one of the plurality of text units, wherein the semantic attribute includes a semantic role, a semantic attribute type or attribute value, or a meaning of the first term or the one or more second terms;

generate a first score for at least one of the second terms based on the relation;

select one or more of the second terms based on the first score as terms associated with the first term; and

associate the selected terms to the first term to form a dataset.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and methods for creating data objects as symbolic or associative representations of terms or objects using machine-based methods are presented. A term can be a word or a phrase, which can also be the name of an object. For a given term, the methods analyze other terms associated with the term, and determine a set of terms or values to be attached to the term to form a dataset, either as a representation of the term, or as information about an object represented by the term, including various properties associated with the object. The methods include obtaining a group of text contents or non-natural language data contents, specifying a target term or symbol, and identifying contextual attributes of the target term or symbol. The contextual attributes include positional and distance attributes, as well as grammatical and semantic attributes.

55 Citations

View as Search Results

19 Claims

1. A computer system for producing a dataset for representing a term or information related to an object, the system comprising:
- one or more processors operable to;
  
  receive a first group of text contents comprising a plurality of text units;
  
  receive, or identify from the text contents, a first term comprising a word or a phrase;
  
  identify one of the plurality of text units comprising a sentence or a phrase containing the first term and a plurality of second terms each comprising a word or a phrase;
  
  identify a relation between the first term and the plurality of second terms in the one of the plurality of text units using a machine-based algorithm based on a distance between the first term and one or more second terms in the one of the plurality of text units, wherein the distance is defined as a number of terms between the first term and one or more second terms in the one of the plurality of text units, or based on a presence or absence of a third term in the one of the plurality of text units, or based on a semantic attribute-associated with the first term in the one of the plurality of text units, wherein the semantic attribute includes a semantic role, a semantic attribute type or attribute value, or a meaning of the first term or the one or more second terms;
  
  generate a first score for at least one of the second terms based on the relation;
  
  select one or more of the second terms based on the first score as terms associated with the first term; and
  
  associate the selected terms to the first term to form a dataset.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1, wherein the one or more processors are further operable to:
    - output the dataset, wherein the dataset is used for providing a representation of the first term by other terms associated with the first term, or providing information associated with the object represented by the first term.
  - 3. The system of claim 1, wherein the first score is generated based on a number of text units that contain the first term or the one or more second terms, or a number of occurrences of the first term or the one or more second terms in the text units.
  - 4. The system of claim 3, wherein the first score is generated further by dividing the first score by the total number of the text units in the first group of text contents.
  - 5. The system of claim 1, wherein the first score is generated based on an occurrence of the one or more second terms in the text units that do not contain the first term, or based on a number of text units that contain the one or more second terms but do not contain the first term.
  - 6. The system of claim 1, wherein the first score is generated based on whether the one of the plurality of text units is a phrase, a sentence, a paragraph, or a document containing a plurality of sentences or paragraphs.
  - 7. The system of claim 1, wherein the first score is generated based on a grammatical or a positional attribute associated with the first term or the one or more second terms or a term in the context of the one or more second terms, wherein the grammatical attribute includes at least the grammatical roles of subject, predicate, part of a predicate, a modifier or a head of a phrase, or a sub-component of a phrase, and parts of speech, wherein the positional attributes includes at least the position of the term in the text unit.
  - 8. The system of claim 1, wherein the first score is attached to the one or more second terms.
  - 9. The system of claim 8, wherein a function of the first score includes representing a strength of association between the at least one of the second terms and the first term, or between a property or attribute represented by the at least one of the second terms and the object represented by the first term.

10. A computer system for producing a dataset for representing a term or an object, the system comprising:
- one or more processors operable to;
  
  receive a first group of text contents comprising a plurality of text units;
  
  receive, or identify from the text contents, a first term comprising a word or a phrase;
  
  identify one of the plurality of text units comprising a sentence or a phrase containing the first term and one or more second terms each comprising a word or a phrase;
  
  identify a relation between the first term and the one or more second terms in the one of the plurality text units using a machine-based algorithm based on a distance between the first term and one or more second terms in the one of the plurality of text units, wherein the distance is defined as a number of terms between the first term and one or more second terms in the one of the plurality of text units, or based on a presence or absence of a third term in the one of the plurality of text units, or based on a semantic attribute-associated with the first term in the one of the plurality of text units, wherein the semantic attribute includes a semantic role, a semantic attribute type or attribute value, or a meaning of the first term or the one or more second terms;
  
  determine one or more numerical values to represent the relation or a strength of the relation between the first term and the one or more second terms;
  
  collect one or more of the one or more numerical values into a group of numerical values;
  
  associate the group of numerical values to the first term to form a dataset;
  
  output the dataset, wherein the dataset is used for providing a representation of the first term or an object represented by the first term based on the relation between the first term and other terms other than the first term.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 11. The system of claim 10, wherein the one or more processors are further operable to:
    - assemble, based on the relation or based on the numerical values, one or more of the one or more second terms into a group of second terms;
      
      associate the group of second terms to the first term to form the dataset.
  - 12. The system of claim 11, wherein the dataset is further used for providing a representation of the first term by other terms, or providing a representation of an object represented by the first term, wherein the object comprises a thing or a concept, topic, or attribute, wherein the group of second terms represent properties associated with the object.
  - 13. The system of claim 10, wherein the one or more numerical values are determined based on a number of text units that contain the first term or the one or more second terms, or a number of occurrences of the first term or the one or more second terms in the text units.
  - 14. The system of claim 13, wherein the one or more numerical values are determined further by dividing the one or more numerical values by the total number of text units in the first group of text contents.
  - 15. The system of claim 10, wherein the one or more numerical values are determined based on a location of the first term or the one or more second terms in the text units.
  - 16. The system of claim 10, wherein the one or more numerical values are determined based on whether the one of the plurality of text units is a phrase, a sentence, a paragraph, or a document containing a plurality of sentences or paragraphs.
  - 17. The system of claim 11, wherein at least one of the one or more second terms in the dataset is associated with one of the one or more numerical values.
  - 18. The system of claim 17, wherein the at least one of the one or more second terms is collected based on the one of the one or more numerical values.
  - 19. The system of claim 17, wherein a function of the one or more of the numerical values includes representing a strength of association between the at least one of the one or more second terms and the first term, or between a property or attribute represented by the at least one of the one or more second terms and the object represented by the first term.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Linfo IP LLC (Pueblo Nuevo LLC)
Original Assignee
Guangsheng Zhang
Inventors
Zhang, Guangsheng
Primary Examiner(s)
Villecco, John

Application Number

US14/948,321
Time in Patent Office

800 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/313   Selection or weighting of t...

G06F 16/36   Creation of semantic tools,...

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/247   Thesauruses; Synonyms

G06F 40/253   Grammatical analysis; Style...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/30   Semantic analysis

Producing datasets for representing terms and objects based on automated learning from text contents

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

55 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Producing datasets for representing terms and objects based on automated learning from text contents

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

55 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others