Text quality evaluation methods and processes

US 10,417,328 B2
Filed: 01/05/2018
Issued: 09/17/2019
Est. Priority Date: 01/05/2018
Status: Active Grant

First Claim

Patent Images

1. A method (1000) for evaluating a quality score of a text, the text comprising a plurality of words (2001-2005), the method comprising the steps of:

computing first probability characteristics (1100) of groups of words in a reference text (2000) including a high-quality text,computing second probability characteristics (1200) of groups of words in a text to be scored,computing the quality score (1300) based on a difference between the first probability characteristics and the second probability characteristics,wherein the step of computing the first probability characteristics (1100) or the step of computing the second probability characteristics (1200) comprises identifying unique groups of words and computing a group weight for each unique group of words, andwherein the group weight is defined as
GW_j=Σ

_{i(for all Si comprising Gj)}F(L_i)where;

GWj is the group weight for the j-th group,G_jis the j-th group,L_iis a length of a sentence,S_iis one of the sentences from the reference text (2000), andthe function F is a monotonically increasing function.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and processes evaluate a quality score of a text. The text includes a plurality of words. The methods compute first probability characteristics of groups of words in a reference text which is known to be a high-quality text. The methods also compute second probability characteristics of groups of words in a text to be scored. The methods also compute the quality score based on a difference between the first probability characteristics and the second probability characteristics.

24 Citations

20 Claims

1. A method (1000) for evaluating a quality score of a text, the text comprising a plurality of words (2001-2005), the method comprising the steps of:
- computing first probability characteristics (1100) of groups of words in a reference text (2000) including a high-quality text,computing second probability characteristics (1200) of groups of words in a text to be scored,computing the quality score (1300) based on a difference between the first probability characteristics and the second probability characteristics,wherein the step of computing the first probability characteristics (1100) or the step of computing the second probability characteristics (1200) comprises identifying unique groups of words and computing a group weight for each unique group of words, andwherein the group weight is defined as
  GW_j=Σ
  
  _{i(for all Si comprising Gj)}F(L_i)where;
  
  GWj is the group weight for the j-th group,G_jis the j-th group,L_iis a length of a sentence,S_iis one of the sentences from the reference text (2000), andthe function F is a monotonically increasing function.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method (1000) according to claim 1, wherein the groups of words have a length comprised between 2 and 6.
  - 3. The method (1000) according to claim 1, wherein the group weight is based on a number of times the unique group appears in the reference text or the text to be scored.
  - 4. The method (1000) according to claim 1, wherein the group weight is based on the length of sentences in which the unique group appears in the reference text or the text to be scored.
  - 5. The method (1000) according to claim 1, wherein the step of computing the first probability characteristics (1100) or the step of computing the second probability characteristics (1200) comprises associating to each unique group the respective group weight and a probability of the unique group to appear in a large text corpus.
  - 6. The method (1000) according to claim 5, wherein the large text corpus has a size larger than 1 million sentences.
  - 7. The method (1000) according to claim 5, wherein the step of computing the first probability characteristics (1100) or the step of computing the second probability characteristics (1200) comprises grouping the unique groups in a plurality of probability intervals based on the probability associated to the unique groups, and combining the group weights within each probability interval.

8. A method (1000) for evaluating a quality score of a text, the text comprising a plurality of words (2001-2005), the method comprising the steps of:
- computing first probability characteristics (1100) of groups of words in a reference text (2000) including a high-quality text,computing second probability characteristics (1200) of groups of words in a text to be scored,computing the quality score (1300) based on a difference between the first probability characteristics and the second probability characteristics,wherein the step of computing the first probability characteristics (1100) or the step of computing the second probability characteristics (1200) comprises identifying unique groups of words and computing a group weight for each unique group of words,wherein the step of computing the first probability characteristics (1100) or the step of computing the second probability characteristics (1200) comprises associating to each unique group the respective group weight and a probability of the unique group to appear in a large text corpus,wherein the step of computing the first probability characteristics (1100) or the step of computing the second probability characteristics (1200) comprises grouping the unique groups in a plurality of probability intervals based on the probability associated to the unique groups, and combining the group weights within each probability interval, andwherein a number of the probability intervals is comprised between 3 and 50.

9. A method (1000) for evaluating a quality score of a text, the text comprising a plurality of words (2001-2005), the method comprising the steps of:
- computing first probability characteristics (1100) of groups of words in a reference text (2000) including a high-quality text,computing second probability characteristics (1200) of groups of words in a text to be scored,computing the quality score (1300) based on a difference between the first probability characteristics and the second probability characteristics,wherein the step of computing the quality score comprises creating a transformation matrix based on the first probability characteristics,wherein the transformation matrix is a matrix which, when applied to transform the first probability characteristics into transformed first probability characteristics and to transform the transformed first probability characteristics into reconstructed first probability characteristics, is configured to minimize a difference between the first probability characteristics and the reconstructed first probability characteristics.
- View Dependent Claims (10)
- - 10. The method (1000) according to claim 9, wherein the step of computing the quality score (1300) comprisesapplying the transformation matrix to the second probability characteristics to transform the second probability characteristics into transformed second probability characteristics and to transform the transformed second probability characteristics into reconstructed second probability characteristics, andcomputing the quality score based on a difference between the second probability characteristics and the reconstructed second probability characteristics.

11. A method (4000) for evaluating a quality value of a text, the text comprising a plurality of words (5001-5005), the method comprising the steps of:
- grouping (4110, 4120) the plurality of words (5001-5005) into a plurality of arrays (5010, 5020) of groups (5011-5014, 5021-5023) of words (5001-5005), wherein all groups (5011-5014, 5021-5023) comprise a predetermined number of words,wherein the predetermined number of words is the same for all groups (5011-5014, 5021-5023) of a given array (5010, 5020), and wherein the predetermined number of words is different for different arrays (5010, 5020),computing a repetition value (4210, 4220) for each array (5010, 5020) based on a number of repeated groups within each array (5010, 5020),computing the quality value (4300) based on the repetition values,wherein the step of computing the repetition value (4210, 4220) comprises the steps of;
  
  counting (6211) a number of times each unique group (5011-5014, 5021-5023) is repeated within a single array (5010, 5020), andcomputing (6212) a repeated content ratio according to the equation
  R_k=[Σ
  
  _{n_j>
  
  1}(n_j)]/[Σ
  
  _j(n_j)]wherein;
  
  n_j denotes the number of times a j-th unique group (5011-5014, 5021-5023) is repeated in the single array (5010, 5020), andk is the predetermined number of words.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The method according to claim 11, wherein the predetermined number of words is comprised between 2 and 20.
  - 13. The method according to claim 11, wherein the step of computing the quality value (4300) comprises a global dampening step (7301) configured to reduce the repeated content ratio for smaller values of repeated content ratio more than for higher values of repeated content ratio.
  - 14. The method according to claim 13, wherein the global dampening step (7301) comprises the operation:
    - Rd_k=R_k{circumflex over (
      
      )}insensitivity
15. The method according to claim 14, wherein insensitivity has a value comprised between 1 and 5.
16. The method according to claim 11, wherein the step of computing the quality value (4300) comprises a local dampening step (7302) which is configured to reduce the repeated content ratio for arrays having a smaller predetermined number of words more than for arrays having a larger predetermined number of words.
17. The method according to claim 16, wherein the local dampening step (7302) comprises the operation:
- Rstd_k=R_k{circumflex over (
  
  )}((min(k)/k){circumflex over (
  
  )}EXP)wherein;
  
  min(k) is a minimum value of k used by the method (4000), andEXP is chosen to be higher than 0.
18. The method according to claim 14, wherein the step of computing the quality value (4300) comprises a local dampening step (7302) which is configured to reduce the repeated content ratio for arrays having a smaller predetermined number of words more than for arrays having a larger predetermined number of words.
19. The method according to claim 18, wherein the local dampening step (7302) comprises the operation:
- Rstd_k=Rd_k{circumflex over (
  
  )}((min(k)/k){circumflex over (
  
  )}EXP)wherein;
  
  min(k) is a minimum value of k used by the method (4000), andEXP is chosen to be higher than 0.
20. The method according to claim 11, wherein the step of computing the quality value (4300) comprises an averaging step (7303) configured to average the repetition values, or values computed based on the repetition values, to provide a single output as indicative of the quality value.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Searchmetrics GmbH
Original Assignee
Searchmetrics GmbH
Inventors
Pala, Ahmet Anil, Kagoshima, Alexander, Tober, Marcus
Primary Examiner(s)
Islam, Mohammad K

Application Number

US15/863,408
Publication Number

US 20190213247A1
Time in Patent Office

620 Days
Field of Search
US Class Current
CPC Class Codes

G06F 40/253 Grammatical analysis; Style...

Text quality evaluation methods and processes

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

24 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Text quality evaluation methods and processes

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

24 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links