Methods and systems for assessing the quality of automatically generated text

US 8,442,813 B1
Filed: 02/05/2009
Issued: 05/14/2013
Est. Priority Date: 02/05/2009
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method of assessing the quality of computer-generated text, the method comprising:

receiving a plurality of characters generated from an image of a document;

determining, for the plurality of characters generated from the image of the document, language-conditional character probabilities based on a set of language models and an ordering of the characters, a language-conditional character probability for a target character in the plurality of characters describing a degree to which the target character and an ordered set of characters preceding the target character concord with a given language model in the set of language models;

identifying, for the target character, neighbor characters proximate to a location of the target character in the image of the document, wherein the neighbor characters have associated language-conditional character probabilities and are within a defined distance from the location of the target character in the image of the document;

combining the language-conditional character probabilities associated with the neighbor characters and the language-conditional character probabilities associated with the target character to generate a local language-conditional likelihood for the target character; and

storing the local language-conditional likelihood for the target character.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A set of ordered characters is received in association with information specifying the locations of the characters within the image of the document. Language-conditional character probabilities for each character are determined based on a set of language models and the ordering of the characters. Neighbor characters associated with a target character are identified based on the locations of the characters. Language-conditional character probabilities associated with the neighbor characters and language-conditional character probabilities associated with the target character are combined to generate a local language-conditional likelihood associated with the target character, the local language-conditional likelihood representing a concordance of the target character to a language model.

Citations

19 Claims

1. A computer-implemented method of assessing the quality of computer-generated text, the method comprising:
- receiving a plurality of characters generated from an image of a document;
  
  determining, for the plurality of characters generated from the image of the document, language-conditional character probabilities based on a set of language models and an ordering of the characters, a language-conditional character probability for a target character in the plurality of characters describing a degree to which the target character and an ordered set of characters preceding the target character concord with a given language model in the set of language models;
  
  identifying, for the target character, neighbor characters proximate to a location of the target character in the image of the document, wherein the neighbor characters have associated language-conditional character probabilities and are within a defined distance from the location of the target character in the image of the document;
  
  combining the language-conditional character probabilities associated with the neighbor characters and the language-conditional character probabilities associated with the target character to generate a local language-conditional likelihood for the target character; and
  
  storing the local language-conditional likelihood for the target character.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further comprising:
    - determining a text quality score associated with the target character based on the local language-conditional likelihood and a cross-entropy rate associated with each language model; and
      
      storing the text quality score.
  - 3. The method of claim 1, wherein identifying the neighbor characters comprises:
    - identifying an area within the image of the document associated with the location of the target character, wherein the area specifies the defined distance between the neighbor characters and the target character.
  - 4. The method of claim 1, further comprising:
    - determining weights associated with the neighbor characters based on proximity of the neighbor characters to the target character in the image of the document;
      
      wherein combining the language-conditional character probabilities associated with the neighbor characters and the language-conditional character probabilities associated with the target character to generate the local language-conditional likelihood comprises modifying the language-conditional character probability for each neighbor character based on a weight associated with the neighbor character.
  - 5. The method of claim 1, wherein combining the language-conditional character probabilities associated with the neighbor characters and the language-conditional character probabilities associated with the target character to generate a local language-conditional likelihood associated with the target character comprises:
    - generating, for each language model of the set of language models, a value that specifies a probability that the target character is associated with a writing system represented by the language model based on a subset of the language-conditional character probabilities associated with the neighbor characters that are generated based on the language model; and
      
      combining the values that specify the probabilities that the target character is associated with the writing systems represented by the set of language models to generate the local language-conditional likelihood.
  - 6. The method of claim 1, wherein determining, for the plurality of characters generated from the image of the document, language-conditional character probabilities based on a set of language models and an ordering of characters further comprises:
    - determining a language-conditional character probability for a character based on a specified number of characters which precede the character in the order.
  - 7. The method of claim 6, wherein the language-conditional character probability is determined based on a conditional probability defined by the given language model, the conditional probability representing a likelihood of observing the character given the specified number of characters, and their order, which precede the character in the writing system represented by the given language model and wherein a high likelihood of observing the character indicates that the character concords with the language model.
  - 8. The method of claim 1, wherein the local language-conditional likelihood represents a concordance of the target character based in part on the target character'"'"'s location within the image of the document to a language model of the set of language models.
  - 9. The method of claim 1, further comprising:
    - receiving information specifying locations of the plurality of characters within the image of the document, the information comprising two-dimensional coordinates,wherein the defined distance between the neighbor characters and the target character is determined based on the two-dimensional coordinates and the local language-conditional likelihood is associated with a location of the target character in the image of the document specified by the two-dimensional coordinates.

10. A computer-implemented method of assessing the quality of computer-generated text, the method comprising:
- receiving a target character and a set of ordered characters preceding the target character;
  
  determining at least a first language-conditional character probability for the target character based on at least a first language model and the ordering of the characters in the set;
  
  identifying neighbor characters within a defined distance from a location of the target character in a digital text from which the target character and the set of ordered characters preceding the target character were generated;
  
  determining at least a first language-conditional character probability for each identified neighbor character based on at least the first language model and an ordering of characters preceding a neighbor character; and
  
  combining the language-conditional character probabilities associated with the neighbor characters and the language-conditional character probabilities associated with the target character to generate a local language-conditional likelihood for the target character, wherein the local language-conditional likelihood represents a concordance of the target character to at least the first language model; and
  
  storing the local language-conditional likelihood for the target character.

11. A computer system for assessing the quality of computer-generated text, comprising:
- a processor for executing computer program instructions;
  
  a computer-readable storage medium storing executable computer program instructions, the computer program instructions comprising;
  
  a language-conditional character probability module executable to;
  
  receive a plurality of characters generated from an image of a document; and
  
  determine, for the plurality of characters generated from the image of the document, language-conditional character probabilities based on a set of language models and an ordering of the characters, a language-conditional character probability for a target character in the plurality of characters describing a degree to which the target character and an ordered set of characters preceding the target character concord with a given language model in the set of language models; and
  
  a local language-conditional likelihood module executable to;
  
  identify, for the target character, neighbor characters proximate to a location of the target character in the image of the document, wherein the neighbor characters have associated language-conditional character probabilities and are within a defined distance from the location of the target character in the image of the document;
  
  combine the language-conditional character probabilities associated with the neighbor characters and the language-conditional character probabilities associated with the target character to generate a local language-conditional likelihood for the target character, andstore the local language-conditional likelihood for the target character.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The system of claim 11, further comprising a score module executable to:
    - determine a text quality score associated with the target character based on the local language-conditional likelihood and a cross-entropy rate associated with each language model; and
      
      store the text quality score in the memory.
  - 13. The system of claim 11, wherein the local language-conditional likelihood module is further executable to:
    - identify an area within the image of the document associated with the location of the target character, wherein the area specifies the defined distance between the neighbor characters and the target character.
  - 14. The system of claim 11, wherein the local language-conditional likelihood module is further executable to:
    - determine weights associated with the neighbor characters based on the proximity of the neighbor characters to the target character in the image of the document; and
      
      wherein combining the language-conditional character probabilities associated with the neighbor characters and the language-conditional character probabilities associated with the target character to generate the local language-conditional likelihood comprises modifying the language-conditional character probability for each neighbor character based on a weight associated with the neighbor character.
  - 15. The system of claim 11, wherein the local language-conditional likelihood module is further executable to:
    - generate, for each language model of the set of language models, a value that specifies a probability that the target character is associated with a writing system represented by the language model based on a subset of the language-conditional character probabilities associated with the neighbor characters that are generated based on the language model; and
      
      combine the values that specify the probabilities that the target character is associated with the writing systems represented by the set of language models to generate the local language-conditional likelihood.
  - 16. The system of claim 11, wherein determining, for the plurality of characters generated from the image of the document, language-conditional character probabilities based on a set of language models and an ordering of characters further comprises:
    - determining a language-conditional character probability for a character based on a specified number of characters which precede the character in the order.
  - 17. The system of claim 16, wherein the language-conditional character probability is determined based on a conditional probability defined by the given language model, the conditional probability representing a likelihood of observing the character given the specified number of characters, and their order, which precede the character in the writing system represented by the given language model and wherein a high likelihood of observing the character indicates that the character concords with the language model.
  - 18. The system of claim 11, wherein the local language-conditional likelihood represents a concordance of the target character based in part on the target character'"'"'s location within the image of the document to a language model of the set of language models.
  - 19. The system of claim 11, wherein the language-conditional character probability module is further executable to receive information specifying locations of the plurality of characters within the image of the document, the information comprising two-dimensional coordinates, the defined distance between the neighbor characters and the target character is determined based on the two-dimensional coordinates, and the local language-conditional likelihood is associated with a location of the target character in the image of the document specified by the two-dimensional coordinates.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Popat, Ashok C.
Primary Examiner(s)
AZAD, ABUL K

Application Number

US12/366,329
Time in Patent Office

1,559 Days
Field of Search

704 1- 10, 382/226, 382/228, 382/229, 382/176, 382/177
US Class Current

704/9
CPC Class Codes

G06F 40/253   Grammatical analysis; Style...

G06F 40/51   Translation evaluation

G06V 30/153   using recognition of charac...

G06V 30/246   using linguistic properties...

G06V 30/268   Lexical context

Methods and systems for assessing the quality of automatically generated text

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems for assessing the quality of automatically generated text

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links