×

Producing datasets for representing terms and objects based on automated learning from text contents

  • US 9,880,998 B1
  • Filed: 11/22/2015
  • Issued: 01/30/2018
  • Est. Priority Date: 08/11/2012
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer system for producing a dataset for representing a term or information related to an object, the system comprising:

  • one or more processors operable to;

    receive a first group of text contents comprising a plurality of text units;

    receive, or identify from the text contents, a first term comprising a word or a phrase;

    identify one of the plurality of text units comprising a sentence or a phrase containing the first term and a plurality of second terms each comprising a word or a phrase;

    identify a relation between the first term and the plurality of second terms in the one of the plurality of text units using a machine-based algorithm based on a distance between the first term and one or more second terms in the one of the plurality of text units, wherein the distance is defined as a number of terms between the first term and one or more second terms in the one of the plurality of text units, or based on a presence or absence of a third term in the one of the plurality of text units, or based on a semantic attribute-associated with the first term in the one of the plurality of text units, wherein the semantic attribute includes a semantic role, a semantic attribute type or attribute value, or a meaning of the first term or the one or more second terms;

    generate a first score for at least one of the second terms based on the relation;

    select one or more of the second terms based on the first score as terms associated with the first term; and

    associate the selected terms to the first term to form a dataset.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×