×

Method and system for analyzing text

  • US 9,292,491 B2
  • Filed: 06/13/2014
  • Issued: 03/22/2016
  • Est. Priority Date: 11/04/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method for predicting a value of a variable associated with a target word or set of words, performed by an apparatus comprising at least one computer and comprising the steps of:

  • the apparatus collecting a text corpus comprising a set of words that include the target word,the apparatus generating a representation of the text corpus,the at least one computer creating a semantic space for the set of words, based on the representation of the text corpus,the at least one computer defining, for a location in the semantic space, a value of the variable,the at least one computer estimating, for the target word, a value of the variable, based on the semantic space and the defined variable value of the location in the semantic space,calculating, by the at least one computer, a predicted value of the target word, on basis of the semantic space, the defined variable value of the location in the semantic space and the estimated variable value of the target word, andstatistically testing if two sets of words or two sets of documents of the text corpora differ in semantic representation,wherein the step of statistically testing comprises;

    i) calculating a first vector to represent a mean location in the semantic space for a first of the two sets of words or documents;

    ii) calculating a second vector to represent a mean location in the semantic space for a second of the two sets of words or documents;

    iii) calculating a distance between the first and second vectors;

    iv) repeating the steps i), ii), and iii) above while assigning the words randomly to the first of the two sets of words or documents and to the second of the two sets of words or documents;

    v) counting a percentage of occasions when the distance for the randomly assigned words is larger than when the distance is based on the non-randomly assigned words; and

    vi) providing the counted percentage as a probability for whether the two sets of words or documents differ in semantic representation.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×