Methods for analysis and evaluation of the semantic content of a writing based on vector length

US 6,356,864 B1
Filed: 07/23/1998
Issued: 03/12/2002
Est. Priority Date: 07/25/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method operable in a computing device for grading an ungraded sample text relative to at least one standard text comprising the steps of:

generating trained matrices with said at least one standard text; and

determining a degree of similarity between said ungraded sample text and said at least one standard text using said trained matrices;

determining a vector length corresponding to said ungraded sample text; and

assigning a grade to said ungraded sample text based on said vector length.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is a methodology for analyzing and evaluating a sample text, such as essay(s), or document(s). This methodology compares sample text to a reference essay(s), document(s), or text segment(s) within a reference essay or document. The methodology analyzes the amount of subject-matter information in the sample text, analyzes the relevance of subject matter information in the sample and evaluates the semantic coherence of the sample. This methodology presumes there is an underlying, latent semantic structure in the usage of words. The method parses and stores text objects and text segments from the sample text and reference text into a two-dimensional data matrix. A weight is computed for each text object and applied to each data matrix cell value. The method performs a singular value decomposition on the data matrix, which produces three trained matrices. The method computes a vector representation of the sample text and reference text using the three trained matrices. The methodology compares the sample text to the reference text by computing the cosine between the vector representation of the sample text and the vector representation of the standard reference text. Alternatively, the dot product is used to compare the sample text to the standard reference text. A grade is assigned to the sample text based on the degree of similarity between the sample text and the standard reference text.

Citations

45 Claims

1. A method operable in a computing device for grading an ungraded sample text relative to at least one standard text comprising the steps of:
- generating trained matrices with said at least one standard text; and
  
  determining a degree of similarity between said ungraded sample text and said at least one standard text using said trained matrices;
  
  determining a vector length corresponding to said ungraded sample text; and
  
  assigning a grade to said ungraded sample text based on said vector length.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1 wherein the step of determining a vector length further comprises the steps of:
3. The method of claim 1 wherein said at least one standard text comprises at least one pregraded essay and wherein the method further comprises the steps of:
- determining a subset of said at least one pregraded essay that is most similar to said ungraded sample text;
  
  computing an average of said subset; and
  
  assigning said average as a grade for said ungraded sample text.
4. The method of claim 1wherein said at least one standard text is parsed into predefined portions and said ungraded sample text is parsed into predefined portions, and wherein the step of determining a degree of similarity between said ungraded sample text and said at least one standard text further comprises the step of determining the degree of similarity between said predefined portion of said at least one standard text and said predefined portion of said ungraded sample text, and wherein the method further comprises the step of:
- assigning a grade to said ungraded sample text based on said degree of similarity between said predefined portion of said ungraded sample text and said predefined portion of said at least one standard text.
5. The method of claim 1wherein said at least one standard text comprises a plurality of additional ungraded sample texts, and wherein the step of determining a degree of similarity between said ungraded sample text and said at least one standard text comprises the step of:
- determining a relative ranking of said plurality of additional ungraded sample texts and said ungraded sample text.
6. The method of claim 1 further comprising the step of:
- parsing said plurality of reference texts into a first set of text object vectors and a first set of segment vectors.
7. The method of claim 6 wherein said step of generating trained matrices with said plurality of reference texts comprises the steps of:
- generating a data matrix using said first set of text object vectors and said first set of segment vectors;
  
  using singular value decomposition to decompose said data matrix to create said trained matrices; and
  
  reducing the dimensions in said trained matrices.
8. The method of claim 7 wherein said step of generating a data matrix comprises the steps of:
- creating a data matrix using said first set of text object vectors as a first dimension and said first set of segment vectors as a second dimension; and
  
  applying a weighted value to each cell in said data matrix.
9. The method of claim 8 wherein step of creating a data matrix further comprises the steps of:
- parsing said at least one standard text into a second set of text object vectors and a second set of segment vectors; and
  
  including in said data matrix said second set of text object vectors in said first dimension and said second set of segment vectors in said second dimension.
10. The method of claim 1 wherein said step of determining a degree of similarity comprises the steps of:
- generating a pseudo-object vector representation of said ungraded sample text;
  
  computing a vector representation for said at least one standard text;
  
  comparing said pseudo-object vector representation and said vector representation; and
  
  assigning a grade to said ungraded sample text based on said comparison.
11. The method of claim 10 wherein said step of generating a pseudo-object vector representation comprises the steps of:
- parsing said ungraded sample text into a first set of text object vectors; and
  
  computing the average of said first set of text object vectors by averaging the sum of each text object vector element of each of said first set of text object vectors in accordance with said trained matrices.
12. The method of claim 10 wherein said step of computing a vector representation comprises the steps of:
- parsing said at least one standard text into a first set of text object vectors; and
  
  computing the average of said first set of text object vectors by averaging the sum of each text object vector element of each of said first set of text object vectors in accordance with said trained matrices.
13. The method of claim 10 wherein the method step of generating a pseudo-object vector representation comprises the steps of:
- parsing said ungraded sample text into a first set of text object vectors; and
  
  computing the sum of said first set of text object vectors by summing each text object vector element of each of said first set of text object vectors in accordance with said trained matrices.
14. The method of claim 10 wherein the step of comparing said pseudo-object vector representation and said vector representation further comprises the step of:
- computing a cosine between said pseudo-object vector representation and said vector representation.
15. The method of claim 10 wherein the step of comparing said pseudo-object vector representation and said vector representation further comprises the step of:
- computing a dot product between said pseudo-object vector representation and said vector representation.

16. The method operable in a computing system for determining similarity of an ungraded sample text relative to at least one pregraded sample text comprising the steps of:
- generating trained matrices with a plurality of reference texts;
  
  determining a vector length corresponding to said ungraded sample text; and
  
  determining a degree of similarity between said ungraded sample text and said at least one pregraded sample text using said trained matrices and using said vector length.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method of claim 16 wherein said step of generating trained matrices comprises the steps of:
18. The method of claim 17 wherein said step of creating a data matrix further comprises the steps of:
- parsing said at least one pregraded sample text into a second set of text object vectors and a second set of segment vectors; and
  
  including in said data matrix said second set of text object vectors in said first dimension and said second set of segment vectors in said second dimension.
19. The method of claim 16 wherein said step of determining a degree of similarity comprises the steps of:
- generating a pseudo-object vector representation of said ungraded sample text;
  
  computing a vector representation for said at least one pregraded sample text;
  
  computing a cosine between said pseudo-object vector representation and said vector representation;
  
  determining a subset of said at least one pregraded sample text that is most closely matching said ungraded sample text;
  
  computing the weighted average of said subset; and
  
  assigning said weighted average as a grade for said ungraded sample text.
20. The method of claim 16 wherein said step of determining a degree of similarity comprises the steps of:
- generating a pseudo-object vector representation of said ungraded sample text;
  
  computing a vector representation for said at least one pregraded sample text;
  
  computing a dot product value between said pseudo-object vector representation and said vector representation;
  
  determining a subset of said at least one pregraded sample text that is most closely matching said ungraded sample text;
  
  computing the weighted average of said subset; and
  
  assigning said weighted average as a grade for said ungraded sample text.

21. A method operable in a computing system for determining similarity of a portion of ungraded sample text relative to a portion of at least one standard text, comprising the steps of:
- generating trained matrices with a plurality of reference texts;
  
  determining a vector length corresponding to said portion of said ungraded sample text; and
  
  determining a degree of similarity between said portion of ungraded sample text and said portion of at least one standard text using said trained matrices.
- View Dependent Claims (22, 23, 25)
- - 22. The method of claim 21 wherein said step of generating trained matrices comprises the steps of:
23. The method of claim 22 wherein said step of creating a data matrix further comprises:
- parsing said portion of said at least one standard text into a second set of text object vectors and a second set of segment vectors; and
  
  including in said data matrix said second set of text object vectors in said first dimension and said second set of segment vectors in said second dimension.
25. The method of claim 21 wherein the step of determining a degree of similarity between said portion of ungraded sample text and said portion of said at least one standard text further comprises the steps of:
- generating a pseudo-object vector representation for said portion of said ungraded sample text;
  
  computing a vector representation for said portion of said at least one standard text;
  
  computing a dot product value between said pseudo-object vector representation and said vector representation;
  
  adding said dot product value computed between each said portion of ungraded sample text and each said portion of said at least one reference text; and
  
  assigning a grade based on said addition of said dot product values.

24. The method of cdaim 21 wherein the step of determining a degree of similarity between said portion of ungraded sample text and said portion of said at least one standard text comprises the steps of:
- generating a pseudo-object vector representation for said portion of said ungraded sample text;
  
  computing a vector representation for said portion of said at least one standard text;
  
  computing a cosine value between said pseudo-object vector representation and said vector representation;
  
  adding said cosine computed between each said portion of ungraded sample text and each said portion of said at least one reference text; and
  
  assigning a grade based on said addition of said cosine values.

26. A method operable in a computing system for relative ranking of a plurality of sample texts, comprising the steps of:
- generating trained matrices with a plurality of reference texts;
  
  determining a vector length corresponding to each of said plurality of sample texts; and
  
  determining a degree of similarity between said plurality of sample texts using said trained matrices and using said vector length.
- View Dependent Claims (27, 28, 29, 30)
- - 27. The method of claim 26 wherein said step of generating trained matrices comprises the steps of:
28. The method of claim 27 wherein said step of creating a data matrix further comprises:
- parsing each of said plurality of sample texts into a second set of text object vectors and a second set of segment vectors; and
  
  including in said data matrix said second set of text object vectors in said first dimension and said second set of segment vectors in said second dimension.
29. The method of claim 26 wherein said step of determining a degree of similarity between said plurality of sample texts comprises the steps of:
- generating a vector representation for each said sample text in said plurality of sample texts;
  
  computing a plurality of cosine values between each said vector representation for each said sample text in said plurality of sample texts;
  
  creating a matrix of said plurality of cosine values wherein indicia of each said sample text of said plurality of sample texts is a first dimension of said matrix of said plurality of cosines values and indicia of each said sample text of said plurality of sample texts is a second dimension of said matrix of said plurality of cosine values;
  
  converting said cosine values within said matrix of said plurality of cosine values into distances; and
  
  assigning a ranking to each said sample text of said plurality of sample texts based on relative values derived from single dimensional scaling of said distances.
30. The method of claim 26 wherein said step of determining a degree of similarity between said plurality of sample texts comprises the steps of:
- generating a vector representation for each said sample text in said plurality of sample texts;
  
  computing a plurality of dot product values between each said vector representation for each said sample text in said plurality of sample texts;
  
  creating a matrix of said plurality of dot product values wherein indicia of each said sample text of said plurality of sample texts is a first dimension of said matrix of said plurality of dot product values and indicia of each said sample text of said plurality of sample texts is a second dimension of said matrix of said plurality of dot product values;
  
  converting said dot product values within said matrix of said plurality of dot product values into distances; and
  
  assigning a ranking to each said sample text of said plurality of sample texts based on relative values derived from single dimensional scaling of said distances.

31. A method operable in a computing system for analysis and evaluation of the amount of relevant knowledge an ungraded sample text contains comprising the steps of:
- generating trained matrices with a plurality of reference texts;
  
  computing a vector length corresponding to said ungraded sample text; and
  
  assigning a grade representing the amount of relevant knowledge to said ungraded sample text based on the vector length of said ungraded sample text using said trained matrices.
- View Dependent Claims (32, 33)
- - 32. The method of claim 31 wherein the step of generating trained matrices with a plurality of reference texts comprises the steps of:
33. The method of claim 31 wherein said step of assigning a grade based on the amount of relevant knowledge contained in said ungraded sample text comprises the steps of:
- generating a pseudo-object vector representation for said ungraded sample text; and
  
  computing a vector length of said pseudo-object representation.

34. A computer readable storage medium tangibly embodying programmed instructions for performing a method for grading an ungraded sample text relative to at least one standard text, the method comprising the steps of:
- generating trained matrices with said at least one standard text;
  
  determining a degree of similarity between said ungraded sample text and said at least one standard text using said trained matrices;
  
  determining a vector length corresponding to said ungraded sample text; and
  
  assigning a grade to said ungraded sample text based on said vector length.
- View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
- - 35. The storage medium of claim 34 wherein the method step of determining a vector length further comprises the steps of:
36. The storage medium of claim 34 wherein said at least one standard text comprises at least one pregraded essay and wherein the method further comprises the steps of:
- determining a subset of said at least one pregraded essay that is most similar to said ungraded sample text;
  
  computing a weighted average of said subset; and
  
  assigning said weighted average as a grade for said ungraded sample text.
37. The storage medium of claim 34wherein said at least one standard text is parsed into predefined portions and said ungraded sample text is parsed into predefined portions, and wherein the step of determining a degree of similarity between said ungraded sample text and said at least one standard text further comprises the step of determining the degree of similarity between said predefined portion of said at least one standard text and said predefined portion of said ungraded sample text, and wherein the method further comprises the step of:
- assigning a grade to said ungraded sample text based on said degree of similarity between said predefined portion of said ungraded sample text and said predefined portion of said at least one standard text.
38. The storage medium of claim 34wherein said at least one standard text comprises a plurality of additional ungraded sample texts, and wherein the step of determining a degree of similarity between said ungraded sample text and said at least one standard text comprises the step of:
- determining a relative ranking of said plurality of additional ungraded sample texts and said ungraded sample text.
39. The storage medium of claim 34 wherein the method further comprises the step of:
- parsing said plurality of reference texts into a first set of text object vectors and a first set of segment vectors.
40. The storage medium of claim 39 wherein the method step of generating trained matrices with said plurality of reference texts comprises the steps of:
- generating a data matrix using said first set of text object vectors and said first set of segment vectors;
  
  using singular value decomposition to decompose said data matrix to create said trained matrices; and
  
  reducing the dimensions in said trained matrices.
41. The storage medium of claim 40 wherein the method step of generating a data matrix comprises the steps of:
- creating a data matrix using said first set of text object vectors as a first dimension and said first set of segment vectors as a second dimension; and
  
  applying a weighted value to each cell in said data matrix.
42. The storage medium of claim 41 wherein the method step of creating a data matrix further comprises the steps of:
- parsing said at least one standard text into a second set of text object vectors and a second set of segment vectors; and
  
  including in said data matrix said second set of text object vectors in said first dimension and said second set of segment vectors in said second dimension.
43. The storage medium of claim 34 wherein the method step of determining a degree of similarity comprises the steps of:
- generating a pseudo-object vector representation of said ungraded sample text;
  
  computing a vector representation for said at least one standard text;
  
  comparing said pseudo-object vector representation and said vector representation; and
  
  assigning a grade to said ungraded sample text based on said comparison.
44. The storage medium of claim 43 wherein the method step of generating a pseudo-object vector representation comprises the steps of:
- parsing said ungraded sample text into a first set of text object vectors; and
  
  computing the average of said first set of text object vectors by averaging the sum of each text object vector element of each of said first set of text object vectors in accordance with said trained matrices.
45. The storage medium of claim 43 wherein the method step of computing a vector representation comprises the steps of:
- parsing said at least one standard text into a first set of text object vectors; and
  
  computing the average of said first set of text object vectors by averaging the sum of each text object vector element of each of said first set of text object vectors in accordance with said trained matrices.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Board of Regents of the University of Colorado (University of Colorado System)
Original Assignee
University Technology Corporation
Inventors
Foltz, Peter William, Rehder, Robert Ernest, Landauer, Thomas K., Laham, Robert Darrell II, Kintsch, Walter
Primary Examiner(s)
Thomas, Joseph

Application Number

US09/121,450
Time in Patent Office

1,328 Days
Field of Search

704/1, 704/9, 704/10, 707/530, 707/531, 707/532, 707/1, 707/2, 707/6, 707/100, 707/101, 707/104, 434/322, 434/350, 434/352, 434/118, 434/353, 434/362
US Class Current

704/1
CPC Class Codes

G06F 40/30 Semantic analysis

Methods for analysis and evaluation of the semantic content of a writing based on vector length

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

45 Claims

Specification

Solutions

Use Cases

Quick Links

Methods for analysis and evaluation of the semantic content of a writing based on vector length

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

45 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links