Methods for analysis and evaluation of the semantic content of a writing based on vector length
First Claim
1. A method operable in a computing device for grading an ungraded sample text relative to at least one standard text comprising the steps of:
- generating trained matrices with said at least one standard text; and
determining a degree of similarity between said ungraded sample text and said at least one standard text using said trained matrices;
determining a vector length corresponding to said ungraded sample text; and
assigning a grade to said ungraded sample text based on said vector length.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention is a methodology for analyzing and evaluating a sample text, such as essay(s), or document(s). This methodology compares sample text to a reference essay(s), document(s), or text segment(s) within a reference essay or document. The methodology analyzes the amount of subject-matter information in the sample text, analyzes the relevance of subject matter information in the sample and evaluates the semantic coherence of the sample. This methodology presumes there is an underlying, latent semantic structure in the usage of words. The method parses and stores text objects and text segments from the sample text and reference text into a two-dimensional data matrix. A weight is computed for each text object and applied to each data matrix cell value. The method performs a singular value decomposition on the data matrix, which produces three trained matrices. The method computes a vector representation of the sample text and reference text using the three trained matrices. The methodology compares the sample text to the reference text by computing the cosine between the vector representation of the sample text and the vector representation of the standard reference text. Alternatively, the dot product is used to compare the sample text to the standard reference text. A grade is assigned to the sample text based on the degree of similarity between the sample text and the standard reference text.
-
Citations
45 Claims
-
1. A method operable in a computing device for grading an ungraded sample text relative to at least one standard text comprising the steps of:
-
generating trained matrices with said at least one standard text; and
determining a degree of similarity between said ungraded sample text and said at least one standard text using said trained matrices;
determining a vector length corresponding to said ungraded sample text; and
assigning a grade to said ungraded sample text based on said vector length. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
generating a pseudo-object vector representation for said ungraded sample text; and
computing a vector length of said pseudo-object vector representation.
-
-
3. The method of claim 1 wherein said at least one standard text comprises at least one pregraded essay and wherein the method further comprises the steps of:
-
determining a subset of said at least one pregraded essay that is most similar to said ungraded sample text;
computing an average of said subset; and
assigning said average as a grade for said ungraded sample text.
-
-
4. The method of claim 1
wherein said at least one standard text is parsed into predefined portions and said ungraded sample text is parsed into predefined portions, and wherein the step of determining a degree of similarity between said ungraded sample text and said at least one standard text further comprises the step of determining the degree of similarity between said predefined portion of said at least one standard text and said predefined portion of said ungraded sample text, and wherein the method further comprises the step of: assigning a grade to said ungraded sample text based on said degree of similarity between said predefined portion of said ungraded sample text and said predefined portion of said at least one standard text.
-
5. The method of claim 1
wherein said at least one standard text comprises a plurality of additional ungraded sample texts, and wherein the step of determining a degree of similarity between said ungraded sample text and said at least one standard text comprises the step of: determining a relative ranking of said plurality of additional ungraded sample texts and said ungraded sample text.
-
6. The method of claim 1 further comprising the step of:
parsing said plurality of reference texts into a first set of text object vectors and a first set of segment vectors.
-
7. The method of claim 6 wherein said step of generating trained matrices with said plurality of reference texts comprises the steps of:
-
generating a data matrix using said first set of text object vectors and said first set of segment vectors;
using singular value decomposition to decompose said data matrix to create said trained matrices; and
reducing the dimensions in said trained matrices.
-
-
8. The method of claim 7 wherein said step of generating a data matrix comprises the steps of:
-
creating a data matrix using said first set of text object vectors as a first dimension and said first set of segment vectors as a second dimension; and
applying a weighted value to each cell in said data matrix.
-
-
9. The method of claim 8 wherein step of creating a data matrix further comprises the steps of:
-
parsing said at least one standard text into a second set of text object vectors and a second set of segment vectors; and
including in said data matrix said second set of text object vectors in said first dimension and said second set of segment vectors in said second dimension.
-
-
10. The method of claim 1 wherein said step of determining a degree of similarity comprises the steps of:
-
generating a pseudo-object vector representation of said ungraded sample text;
computing a vector representation for said at least one standard text;
comparing said pseudo-object vector representation and said vector representation; and
assigning a grade to said ungraded sample text based on said comparison.
-
-
11. The method of claim 10 wherein said step of generating a pseudo-object vector representation comprises the steps of:
-
parsing said ungraded sample text into a first set of text object vectors; and
computing the average of said first set of text object vectors by averaging the sum of each text object vector element of each of said first set of text object vectors in accordance with said trained matrices.
-
-
12. The method of claim 10 wherein said step of computing a vector representation comprises the steps of:
-
parsing said at least one standard text into a first set of text object vectors; and
computing the average of said first set of text object vectors by averaging the sum of each text object vector element of each of said first set of text object vectors in accordance with said trained matrices.
-
-
13. The method of claim 10 wherein the method step of generating a pseudo-object vector representation comprises the steps of:
-
parsing said ungraded sample text into a first set of text object vectors; and
computing the sum of said first set of text object vectors by summing each text object vector element of each of said first set of text object vectors in accordance with said trained matrices.
-
-
14. The method of claim 10 wherein the step of comparing said pseudo-object vector representation and said vector representation further comprises the step of:
computing a cosine between said pseudo-object vector representation and said vector representation.
-
15. The method of claim 10 wherein the step of comparing said pseudo-object vector representation and said vector representation further comprises the step of:
computing a dot product between said pseudo-object vector representation and said vector representation.
-
16. The method operable in a computing system for determining similarity of an ungraded sample text relative to at least one pregraded sample text comprising the steps of:
-
generating trained matrices with a plurality of reference texts;
determining a vector length corresponding to said ungraded sample text; and
determining a degree of similarity between said ungraded sample text and said at least one pregraded sample text using said trained matrices and using said vector length. - View Dependent Claims (17, 18, 19, 20)
parsing said plurality of reference texts into a first set of text object vectors and a first set of segment vectors;
creating a data matrix using said first set of text object vectors as a first dimension and using said first set of segment vectors as a second dimension;
applying a weighted value to each cell within said data matrix;
using singular value decomposition to decompose said data matrix to create trained matrices; and
reducing the dimensions in said trained matrices.
-
-
18. The method of claim 17 wherein said step of creating a data matrix further comprises the steps of:
-
parsing said at least one pregraded sample text into a second set of text object vectors and a second set of segment vectors; and
including in said data matrix said second set of text object vectors in said first dimension and said second set of segment vectors in said second dimension.
-
-
19. The method of claim 16 wherein said step of determining a degree of similarity comprises the steps of:
-
generating a pseudo-object vector representation of said ungraded sample text;
computing a vector representation for said at least one pregraded sample text;
computing a cosine between said pseudo-object vector representation and said vector representation;
determining a subset of said at least one pregraded sample text that is most closely matching said ungraded sample text;
computing the weighted average of said subset; and
assigning said weighted average as a grade for said ungraded sample text.
-
-
20. The method of claim 16 wherein said step of determining a degree of similarity comprises the steps of:
-
generating a pseudo-object vector representation of said ungraded sample text;
computing a vector representation for said at least one pregraded sample text;
computing a dot product value between said pseudo-object vector representation and said vector representation;
determining a subset of said at least one pregraded sample text that is most closely matching said ungraded sample text;
computing the weighted average of said subset; and
assigning said weighted average as a grade for said ungraded sample text.
-
-
21. A method operable in a computing system for determining similarity of a portion of ungraded sample text relative to a portion of at least one standard text, comprising the steps of:
-
generating trained matrices with a plurality of reference texts;
determining a vector length corresponding to said portion of said ungraded sample text; and
determining a degree of similarity between said portion of ungraded sample text and said portion of at least one standard text using said trained matrices. - View Dependent Claims (22, 23, 25)
parsing said plurality of reference texts into a first set of text objects vectors and a first set of segment vectors;
creating a data matrix using said first set of text object vectors as a first dimension and said first set of segment vectors as a second dimension;
applying a weighted value to each cell within said data matrix;
using singular value decomposition to decompose said data matrix to create trained matrices; and
reducing the dimensions in said trained matrices.
-
-
23. The method of claim 22 wherein said step of creating a data matrix further comprises:
-
parsing said portion of said at least one standard text into a second set of text object vectors and a second set of segment vectors; and
including in said data matrix said second set of text object vectors in said first dimension and said second set of segment vectors in said second dimension.
-
-
25. The method of claim 21 wherein the step of determining a degree of similarity between said portion of ungraded sample text and said portion of said at least one standard text further comprises the steps of:
-
generating a pseudo-object vector representation for said portion of said ungraded sample text;
computing a vector representation for said portion of said at least one standard text;
computing a dot product value between said pseudo-object vector representation and said vector representation;
adding said dot product value computed between each said portion of ungraded sample text and each said portion of said at least one reference text; and
assigning a grade based on said addition of said dot product values.
-
-
24. The method of cdaim 21 wherein the step of determining a degree of similarity between said portion of ungraded sample text and said portion of said at least one standard text comprises the steps of:
-
generating a pseudo-object vector representation for said portion of said ungraded sample text;
computing a vector representation for said portion of said at least one standard text;
computing a cosine value between said pseudo-object vector representation and said vector representation;
adding said cosine computed between each said portion of ungraded sample text and each said portion of said at least one reference text; and
assigning a grade based on said addition of said cosine values.
-
-
26. A method operable in a computing system for relative ranking of a plurality of sample texts, comprising the steps of:
-
generating trained matrices with a plurality of reference texts;
determining a vector length corresponding to each of said plurality of sample texts; and
determining a degree of similarity between said plurality of sample texts using said trained matrices and using said vector length. - View Dependent Claims (27, 28, 29, 30)
parsing said plurality of reference texts into a first set of text object vectors and a first set of segment vectors;
creating a data matrix using said first set of text object vectors as a first dimension and said first set of segment vectors as a second dimension;
applying a weighted value to each cell within said data matrix;
using singular value decomposition to decompose said data matrix to create trained matrices; and
reducing the dimensions in said trained matrices.
-
-
28. The method of claim 27 wherein said step of creating a data matrix further comprises:
-
parsing each of said plurality of sample texts into a second set of text object vectors and a second set of segment vectors; and
including in said data matrix said second set of text object vectors in said first dimension and said second set of segment vectors in said second dimension.
-
-
29. The method of claim 26 wherein said step of determining a degree of similarity between said plurality of sample texts comprises the steps of:
-
generating a vector representation for each said sample text in said plurality of sample texts;
computing a plurality of cosine values between each said vector representation for each said sample text in said plurality of sample texts;
creating a matrix of said plurality of cosine values wherein indicia of each said sample text of said plurality of sample texts is a first dimension of said matrix of said plurality of cosines values and indicia of each said sample text of said plurality of sample texts is a second dimension of said matrix of said plurality of cosine values;
converting said cosine values within said matrix of said plurality of cosine values into distances; and
assigning a ranking to each said sample text of said plurality of sample texts based on relative values derived from single dimensional scaling of said distances.
-
-
30. The method of claim 26 wherein said step of determining a degree of similarity between said plurality of sample texts comprises the steps of:
-
generating a vector representation for each said sample text in said plurality of sample texts;
computing a plurality of dot product values between each said vector representation for each said sample text in said plurality of sample texts;
creating a matrix of said plurality of dot product values wherein indicia of each said sample text of said plurality of sample texts is a first dimension of said matrix of said plurality of dot product values and indicia of each said sample text of said plurality of sample texts is a second dimension of said matrix of said plurality of dot product values;
converting said dot product values within said matrix of said plurality of dot product values into distances; and
assigning a ranking to each said sample text of said plurality of sample texts based on relative values derived from single dimensional scaling of said distances.
-
-
31. A method operable in a computing system for analysis and evaluation of the amount of relevant knowledge an ungraded sample text contains comprising the steps of:
-
generating trained matrices with a plurality of reference texts;
computing a vector length corresponding to said ungraded sample text; and
assigning a grade representing the amount of relevant knowledge to said ungraded sample text based on the vector length of said ungraded sample text using said trained matrices. - View Dependent Claims (32, 33)
parsing said plurality of reference texts into a first set of text object vectors and a first set of segment vectors;
creating a data matrix using said first set of text object vectors as a first dimension and said first set of segment vectors as a second dimension;
applying a weighted value to each cell in said data matrix;
using singular value decomposition to decompose said data matrix to create trained matrices; and
reducing the dimensions of said trained matrices.
-
-
33. The method of claim 31 wherein said step of assigning a grade based on the amount of relevant knowledge contained in said ungraded sample text comprises the steps of:
-
generating a pseudo-object vector representation for said ungraded sample text; and
computing a vector length of said pseudo-object representation.
-
-
34. A computer readable storage medium tangibly embodying programmed instructions for performing a method for grading an ungraded sample text relative to at least one standard text, the method comprising the steps of:
-
generating trained matrices with said at least one standard text;
determining a degree of similarity between said ungraded sample text and said at least one standard text using said trained matrices;
determining a vector length corresponding to said ungraded sample text; and
assigning a grade to said ungraded sample text based on said vector length. - View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
generating a pseudo-object vector representation for said ungraded sample text; and
computing a vector length of said pseudo-object vector representacion.
-
-
36. The storage medium of claim 34 wherein said at least one standard text comprises at least one pregraded essay and wherein the method further comprises the steps of:
-
determining a subset of said at least one pregraded essay that is most similar to said ungraded sample text;
computing a weighted average of said subset; and
assigning said weighted average as a grade for said ungraded sample text.
-
-
37. The storage medium of claim 34
wherein said at least one standard text is parsed into predefined portions and said ungraded sample text is parsed into predefined portions, and wherein the step of determining a degree of similarity between said ungraded sample text and said at least one standard text further comprises the step of determining the degree of similarity between said predefined portion of said at least one standard text and said predefined portion of said ungraded sample text, and wherein the method further comprises the step of: assigning a grade to said ungraded sample text based on said degree of similarity between said predefined portion of said ungraded sample text and said predefined portion of said at least one standard text.
-
38. The storage medium of claim 34
wherein said at least one standard text comprises a plurality of additional ungraded sample texts, and wherein the step of determining a degree of similarity between said ungraded sample text and said at least one standard text comprises the step of: determining a relative ranking of said plurality of additional ungraded sample texts and said ungraded sample text.
-
39. The storage medium of claim 34 wherein the method further comprises the step of:
parsing said plurality of reference texts into a first set of text object vectors and a first set of segment vectors.
-
40. The storage medium of claim 39 wherein the method step of generating trained matrices with said plurality of reference texts comprises the steps of:
-
generating a data matrix using said first set of text object vectors and said first set of segment vectors;
using singular value decomposition to decompose said data matrix to create said trained matrices; and
reducing the dimensions in said trained matrices.
-
-
41. The storage medium of claim 40 wherein the method step of generating a data matrix comprises the steps of:
-
creating a data matrix using said first set of text object vectors as a first dimension and said first set of segment vectors as a second dimension; and
applying a weighted value to each cell in said data matrix.
-
-
42. The storage medium of claim 41 wherein the method step of creating a data matrix further comprises the steps of:
-
parsing said at least one standard text into a second set of text object vectors and a second set of segment vectors; and
including in said data matrix said second set of text object vectors in said first dimension and said second set of segment vectors in said second dimension.
-
-
43. The storage medium of claim 34 wherein the method step of determining a degree of similarity comprises the steps of:
-
generating a pseudo-object vector representation of said ungraded sample text;
computing a vector representation for said at least one standard text;
comparing said pseudo-object vector representation and said vector representation; and
assigning a grade to said ungraded sample text based on said comparison.
-
-
44. The storage medium of claim 43 wherein the method step of generating a pseudo-object vector representation comprises the steps of:
-
parsing said ungraded sample text into a first set of text object vectors; and
computing the average of said first set of text object vectors by averaging the sum of each text object vector element of each of said first set of text object vectors in accordance with said trained matrices.
-
-
45. The storage medium of claim 43 wherein the method step of computing a vector representation comprises the steps of:
-
parsing said at least one standard text into a first set of text object vectors; and
computing the average of said first set of text object vectors by averaging the sum of each text object vector element of each of said first set of text object vectors in accordance with said trained matrices.
-
Specification