Text quality evaluation methods and processes
First Claim
Patent Images
1. A method (1000) for evaluating a quality score of a text, the text comprising a plurality of words (2001-2005), the method comprising the steps of:
- computing first probability characteristics (1100) of groups of words in a reference text (2000) including a high-quality text,computing second probability characteristics (1200) of groups of words in a text to be scored,computing the quality score (1300) based on a difference between the first probability characteristics and the second probability characteristics,wherein the step of computing the first probability characteristics (1100) or the step of computing the second probability characteristics (1200) comprises identifying unique groups of words and computing a group weight for each unique group of words, andwherein the group weight is defined as
GWj=Σ
i(for all Si comprising Gj)F(Li)where;
GWj is the group weight for the j-th group,Gj is the j-th group,Li is a length of a sentence,Si is one of the sentences from the reference text (2000), andthe function F is a monotonically increasing function.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and processes evaluate a quality score of a text. The text includes a plurality of words. The methods compute first probability characteristics of groups of words in a reference text which is known to be a high-quality text. The methods also compute second probability characteristics of groups of words in a text to be scored. The methods also compute the quality score based on a difference between the first probability characteristics and the second probability characteristics.
24 Citations
20 Claims
-
1. A method (1000) for evaluating a quality score of a text, the text comprising a plurality of words (2001-2005), the method comprising the steps of:
-
computing first probability characteristics (1100) of groups of words in a reference text (2000) including a high-quality text, computing second probability characteristics (1200) of groups of words in a text to be scored, computing the quality score (1300) based on a difference between the first probability characteristics and the second probability characteristics, wherein the step of computing the first probability characteristics (1100) or the step of computing the second probability characteristics (1200) comprises identifying unique groups of words and computing a group weight for each unique group of words, and wherein the group weight is defined as
GWj=Σ
i(for all Si comprising Gj)F(Li)where; GWj is the group weight for the j-th group, Gj is the j-th group, Li is a length of a sentence, Si is one of the sentences from the reference text (2000), and the function F is a monotonically increasing function. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method (1000) for evaluating a quality score of a text, the text comprising a plurality of words (2001-2005), the method comprising the steps of:
-
computing first probability characteristics (1100) of groups of words in a reference text (2000) including a high-quality text, computing second probability characteristics (1200) of groups of words in a text to be scored, computing the quality score (1300) based on a difference between the first probability characteristics and the second probability characteristics, wherein the step of computing the first probability characteristics (1100) or the step of computing the second probability characteristics (1200) comprises identifying unique groups of words and computing a group weight for each unique group of words, wherein the step of computing the first probability characteristics (1100) or the step of computing the second probability characteristics (1200) comprises associating to each unique group the respective group weight and a probability of the unique group to appear in a large text corpus, wherein the step of computing the first probability characteristics (1100) or the step of computing the second probability characteristics (1200) comprises grouping the unique groups in a plurality of probability intervals based on the probability associated to the unique groups, and combining the group weights within each probability interval, and wherein a number of the probability intervals is comprised between 3 and 50.
-
-
9. A method (1000) for evaluating a quality score of a text, the text comprising a plurality of words (2001-2005), the method comprising the steps of:
-
computing first probability characteristics (1100) of groups of words in a reference text (2000) including a high-quality text, computing second probability characteristics (1200) of groups of words in a text to be scored, computing the quality score (1300) based on a difference between the first probability characteristics and the second probability characteristics, wherein the step of computing the quality score comprises creating a transformation matrix based on the first probability characteristics, wherein the transformation matrix is a matrix which, when applied to transform the first probability characteristics into transformed first probability characteristics and to transform the transformed first probability characteristics into reconstructed first probability characteristics, is configured to minimize a difference between the first probability characteristics and the reconstructed first probability characteristics. - View Dependent Claims (10)
-
-
11. A method (4000) for evaluating a quality value of a text, the text comprising a plurality of words (5001-5005), the method comprising the steps of:
-
grouping (4110, 4120) the plurality of words (5001-5005) into a plurality of arrays (5010, 5020) of groups (5011-5014, 5021-5023) of words (5001-5005), wherein all groups (5011-5014, 5021-5023) comprise a predetermined number of words, wherein the predetermined number of words is the same for all groups (5011-5014, 5021-5023) of a given array (5010, 5020), and wherein the predetermined number of words is different for different arrays (5010, 5020), computing a repetition value (4210, 4220) for each array (5010, 5020) based on a number of repeated groups within each array (5010, 5020), computing the quality value (4300) based on the repetition values, wherein the step of computing the repetition value (4210, 4220) comprises the steps of; counting (6211) a number of times each unique group (5011-5014, 5021-5023) is repeated within a single array (5010, 5020), and computing (6212) a repeated content ratio according to the equation
R_k=[Σ
n_j>
1(n_j)]/[Σ
j(n_j)]wherein; n_j denotes the number of times a j-th unique group (5011-5014, 5021-5023) is repeated in the single array (5010, 5020), and k is the predetermined number of words. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
wherein; Rd_k is a dampened repeated content ratio for a given value of k, insensitivity has a value which can be selected to any value higher than 1.
-
-
15. The method according to claim 14, wherein insensitivity has a value comprised between 1 and 5.
-
16. The method according to claim 11, wherein the step of computing the quality value (4300) comprises a local dampening step (7302) which is configured to reduce the repeated content ratio for arrays having a smaller predetermined number of words more than for arrays having a larger predetermined number of words.
-
17. The method according to claim 16, wherein the local dampening step (7302) comprises the operation:
-
Rstd_k=R_k{circumflex over (
)}((min(k)/k){circumflex over (
)}EXP)wherein; min(k) is a minimum value of k used by the method (4000), and EXP is chosen to be higher than 0.
-
-
18. The method according to claim 14, wherein the step of computing the quality value (4300) comprises a local dampening step (7302) which is configured to reduce the repeated content ratio for arrays having a smaller predetermined number of words more than for arrays having a larger predetermined number of words.
-
19. The method according to claim 18, wherein the local dampening step (7302) comprises the operation:
-
Rstd_k=Rd_k{circumflex over (
)}((min(k)/k){circumflex over (
)}EXP)wherein; min(k) is a minimum value of k used by the method (4000), and EXP is chosen to be higher than 0.
-
-
20. The method according to claim 11, wherein the step of computing the quality value (4300) comprises an averaging step (7303) configured to average the repetition values, or values computed based on the repetition values, to provide a single output as indicative of the quality value.
Specification