SYSTEMS AND METHODS FOR NORMALIZING INPUT MEDIA

US 20120072204A1
Filed: 09/22/2010
Published: 03/22/2012
Est. Priority Date: 09/22/2010
Status: Active Grant

First Claim

Patent Images

1. A system for processing input media for provision to a text to speech engine comprising:

a rules engine configured to maintain and update rules for processing the input media, wherein the rules comprise pre-parsing rules, parsing rules, tagging rules, and post-parsing rules;

a pre-parsing filter module configured to determine one or more metadata attributes using pre-parsing rules, wherein one metadata attribute is an application type;

a parsing filter module configured to query the rules engine for parsing rules associated with the one or more metadata attributes and to identify a content component from the input media using the parsing rules;

a context and language detector configured to determine a default context and a default language for at least part of the content component;

a learning agent configured to divide the content component into units of interest;

a tagging module configured to query the rules engine for tagging rules associated with the default context and the default language and to iteratively assign tags to the units of interest using the tagging rules, wherein each tag is associated with a post-parsing rule;

a post-parsing filter module configured to modify the content component by executing the post-parsing rules identified by the tags assigned to the units of interest;

wherein the context and language detector, tagging module, learning agent and post-parsing filter module are configured to iteratively process the content component and modifications thereto until there are no further modifications or a threshold number of iterations are performed; and

an output module configured to transmit the modified content component.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for processing input media for provision to a text to speech engine comprising: a rules engine configured to maintain and update rules for processing the input media; a pre-parsing filter module configured to determine one or more metadata attributes using pre-parsing rules; a parsing filter module configured to identify content component from the input media using the parsing rules; a context and language detector configured to determine a default context and a default language; a learning agent configured to divide the content component into units of interest; a tagging module configured to iteratively assign tags to the units of interest using the tagging rules, wherein each tag is associated with a post-parsing rule; a post-parsing filter module configured to modify the content component by executing the post-parsing rules identified by the tags assigned to the phrases and strings. The context and language detector, tagging module, learning agent and post-parsing filter module are configured to iteratively process the content component and modifications thereto until there are no further modifications or a threshold number of iterations are performed.

Citations

29 Claims

1. A system for processing input media for provision to a text to speech engine comprising:
- a rules engine configured to maintain and update rules for processing the input media, wherein the rules comprise pre-parsing rules, parsing rules, tagging rules, and post-parsing rules;
  
  a pre-parsing filter module configured to determine one or more metadata attributes using pre-parsing rules, wherein one metadata attribute is an application type;
  
  a parsing filter module configured to query the rules engine for parsing rules associated with the one or more metadata attributes and to identify a content component from the input media using the parsing rules;
  
  a context and language detector configured to determine a default context and a default language for at least part of the content component;
  
  a learning agent configured to divide the content component into units of interest;
  
  a tagging module configured to query the rules engine for tagging rules associated with the default context and the default language and to iteratively assign tags to the units of interest using the tagging rules, wherein each tag is associated with a post-parsing rule;
  
  a post-parsing filter module configured to modify the content component by executing the post-parsing rules identified by the tags assigned to the units of interest;
  
  wherein the context and language detector, tagging module, learning agent and post-parsing filter module are configured to iteratively process the content component and modifications thereto until there are no further modifications or a threshold number of iterations are performed; and
  
  an output module configured to transmit the modified content component.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The system of claim 1 further comprising:
    - a text to speech dictionary module configured to process common literals in the modified content component;
      
      a formatting module configured to convert the modified content component to speech synthesis markup language text with embedded speech directives; and
      
      a text to speech engine configured to convert the speech synthesis markup language text to speech signals and transmit the speech signals.
  - 3. The system of claim 1 wherein the context and language detector is operable to detect a local language for one or more units of interest of the content component, wherein the local language is different from the default language;
    - and wherein the tagging module is configured to query the rules engine for tagging rules associated with the local language.
  - 4. The system of claim 1 wherein a metadata attribute is the default context, and wherein the pre-parsing module is configured to determine the default context based on the application type.
  - 5. The system of claim 1 wherein a metadata attribute is the default language, and wherein the pre-parsing module is configured to determine the default language based on the language identifying meta-data or the application type.
  - 6. The system of claim 1 wherein each rule comprises an ambiguous case resolution mechanism;
    - wherein, in response to detecting an ambiguity in the content component, the ambiguous case resolution mechanism uses the metadata attributes and statistical data to statistically resolve the a ambiguity.
  - 7. The system of claim 1 further comprising a quality assurance agent that is prompted to resolve an ambiguity in the event the ambiguity cannot be resolved statistically within a pre-configured minimum level of confidence.
  - 8. The system of claim 1 where in the pre-parsing module is configured to determine the one or more metadata attributes byapplying different test rules to the input media;
    - computing a value for each test rule applied;
      
      determining which test rule computed the highest score; and
      
      using the test rule with the highest score to determine the one or more metadata attributes.
  - 9. The system of claim 1 where in the pre-parsing module is configured to determine metadata attributes by comparing strings in the input media to a set of dictionaries.
  - 10. The system of claim 9 wherein upon determining that two or more test rules have the same highest score, the pre-parsing module is further configured to apply a default priority list to determine which of the two or more test rules to use to determine the metadata attributes.
  - 11. The system of claim 1 wherein each tag is associated with a confidence score and wherein when two or more tags conflict the post-parsing filter module is configured to execute the post-parsing rule associated with the tag with the highest confidence score.
  - 12. The system of claim 1 wherein the tagging module is configured to iteratively assign the tags until no more tagging rules apply.
  - 13. The system of claim 1 wherein the tagging module is configured to iteratively assign the tags until a maximum limit of tags per word is reached.
  - 14. The system of claim 1 wherein the context and language detector determines the default context by computing, for each context, an aggregate score for the content component using a context model, wherein the context model defines, for each context, a list of strings and associated scores, wherein the default context is the context with the highest aggregate score for the content component.

15. A method for processing input media for provision to a text to speech engine comprising:
- maintaining and updating rules for processing the input media, wherein the rules comprise pre-parsing rules, parsing rules, tagging rules, and post-parsing rules;
  
  determining one or more metadata attributes using pre-parsing rules, wherein one metadata attribute is an application type;
  
  identifying a content component from the input media using parsing rules associated with the one or more metadata attributes;
  
  determining, for at least part of the content component, a default context and a default language;
  
  dividing the content component into units of interest;
  
  iteratively assigning tags to the units of interest using the tagging rules associated with the default context and the default language, wherein each tag is associated with a post-parsing rule;
  
  modifying the content component by executing the post-parsing rules identified by the tags assigned to the phrases and strings;
  
  iteratively processing the content component and modifications thereto until there are no further modifications or a threshold number of iterations are performed; and
  
  outputting the modified content component.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 16. The method of claim 15 further comprising:
    - processing common literals in the modified content component;
      
      converting the modified content component to speech synthesis markup language text with embedded speech directives; and
      
      converting the speech synthesis markup language text to speech signals and transmitting the speech signals.
  - 17. The method of claim 15 further comprising detecting a local language for one or more phrases and strings of the content component, wherein the local language is different from the default language;
    - and querying the rules engine for tagging rules associated with the local language.
  - 18. The method of claim 15 wherein a metadata attribute is a default context, and the method further comprises determining the default context based on the application type.
  - 19. The method of claim 15 wherein a metadata attribute is the default language, and the method further comprises determining the default language based on the language identifying meta-data or the application type.
  - 20. The method of claim 15 wherein each rule comprises an ambiguous case resolution mechanism;
    - wherein the method further comprises detecting an ambiguity in the content component and using the ambiguous case resolution mechanism to statistically resolve the a ambiguity based on the metadata attributes and statistical data.
  - 21. The method of claim 20 further comprising prompting a quality assurance agent to resolve an ambiguity in the event the ambiguity cannot be resolved statistically with a pre-configured minimum level of confidence.
  - 22. The method of claim 15 where determining the one or more metadata attributes comprises:
    - applying different test rules to the input media;
      
      computing a value for each test rule applied;
      
      determining which test rule computed the highest score; and
      
      using the test rule with the highest score to determine the one or more metadata attributes.
  - 23. The method of claim 15 where determining metadata attributes comprises comparing strings in the input media to a set of dictionaries.
  - 24. The method of claim 22 wherein upon determining that two or more test rules have the same highest score, the method further comprises applying a default priority list to determine which of the two or more test rules to use to determine the metadata attributes.
  - 25. The method of claim 15 wherein each tag is associated with a confidence score and wherein when two or more tags conflict the method further comprises executing the post-parsing rule associated with the tag with the highest confidence score.
  - 26. The method of claim 15 further comprising iteratively assigning the tags until no more tagging rules apply.
  - 27. The method of claim 15 further comprising iteratively assigning the tags until a maximum limit of tags per word is reached.
  - 28. The method of claim 15 further comprising determining the default context by computing, for each context, an aggregate score for the content component using a context model, wherein the context model defines, for each context, a list of strings and associated scores, wherein the default context is the context with the highest aggregate score for the content component.

29. A non-transitory computer-readable medium upon which a plurality of instructions are stored, the instructions for performing the steps of:
- maintaining and updating rules for processing the input media, wherein the rules comprise pre-parsing rules, parsing rules, tagging rules, and post-parsing rules;
  
  determining one or more metadata attributes using pre-parsing rules, wherein one metadata attribute is an application type;
  
  identifying a content component from the input media using parsing rules associated with the one or more metadata attributes;
  
  determining, for at least part of the content component, a default context and a default language;
  
  dividing the content component into units of interest;
  
  iteratively assigning tags to the units of interest using the tagging rules associated with the default context and the default language, wherein each tag is associated with a post-parsing rule;
  
  modifying the content component by executing the post-parsing rules identified by the tags assigned to the phrases and strings;
  
  iteratively processing the content component and modifications thereto until there are no further modifications or a threshold number of iterations are performed; and
  
  outputting the modified content component.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Voice on The Go, Inc.
Original Assignee
Voice on The Go, Inc.
Inventors
Nasri, Babak, Thayaparam, Selva

Granted Patent

US 8,688,435 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 40/131   Fragmentation of text files...

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/232   Orthographic correction, e....

G06F 40/253   Grammatical analysis; Style...

G06F 40/263   Language identification

G06F 40/295   Named entity recognition

G10L 13/086   Detection of language

G10L 13/10   Prosody rules derived from ...

SYSTEMS AND METHODS FOR NORMALIZING INPUT MEDIA

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEMS AND METHODS FOR NORMALIZING INPUT MEDIA

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links