Hierarchical approach for the statistical vowelization of Arabic text

US 8,069,045 B2
Filed: 09/23/2004
Issued: 11/29/2011
Est. Priority Date: 02/26/2004
Status: Active Grant

First Claim

Patent Images

1. A method of supplementing an input text given in an incomplete language with missing information, the method comprising:

enriching said input text given in an incomplete language with the missing information using at least one processor programmed to implement a first statistical method configured to operate on a first type of linguistic unit and a second statistical method configured to operate on a second type of linguistic unit, wherein the enriching comprises applying the first statistical method to the input text to generate an intermediate result, and after the first statistical method has been applied, applying the second statistical method to at least a portion of the intermediate result to generate an enriched representation of the input text.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to the field of computer-aided text and speech processing, and in particular to a method and respective system for converting an input text given in an incomplete language, for example a language, in which unvowelized text is used, into speech, wherein a computer-aided grapheme-phoneme conversion is used. In order to improve completion of the text, it is proposed to

a) use statistical methods including decision trees and stochastic language models for enriching, i.e. completing said input text with missing information—which may be desired for a full understanding of the input text
b) subjecting the completed input text to said grapheme-phoneme conversion to produce synthetic speech.

Advantageously, the text is completed according to a model hierarchy giving higher priority to longer chunks of text, ie sentences (310, 315, 320) then multiword phrases (330, 335, 340), then words (350, 355, 360) and finally character groups (370, 375, 380, 390).

Citations

20 Claims

1. A method of supplementing an input text given in an incomplete language with missing information, the method comprising:
- enriching said input text given in an incomplete language with the missing information using at least one processor programmed to implement a first statistical method configured to operate on a first type of linguistic unit and a second statistical method configured to operate on a second type of linguistic unit, wherein the enriching comprises applying the first statistical method to the input text to generate an intermediate result, and after the first statistical method has been applied, applying the second statistical method to at least a portion of the intermediate result to generate an enriched representation of the input text.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method according to claim 1, wherein said first statistical method and/or second statistical method includes applying at least one stochastic language model and/or decision tree.
  - 3. The method according to claim 2, wherein said at least one stochastic language model includes an N-gram language model, wherein N is preferably in the range N=1, . . . 5.
  - 4. The method according to claim 2, further comprising:
    - updating at least one dictionary and/or the at least one stochastic language model with new words, phrases or sentences.
  - 5. The method according to claim 2, further comprising:
    - updating at least one dictionary and/or the at least one stochastic language model with feedback information reflecting the actual use of selected words, phrases or sentences.
  - 6. The method according to claim 1, wherein the input text lacks at least some vowels, and wherein the enriched representation of the input text includes the at least some vowels.
  - 7. The method according to claim 6, wherein the incomplete language is a Semitic language.
  - 8. The method according to claim 7, further comprising:
    - subjecting the enriched representation of the input text to grapheme-phoneme conversion to produce a phonetic description of said input text;
      
      testing if the phonetic description of an enriched text element follows a language-specific syllable structure; and
      
      selecting a different phonetic description for the text element if the phonetic description of the enriched text element does not follow the language-specific syllable structure.
  - 9. The method according to claim 1, further comprisingevaluating a given enrichment corpus for enrichment of said input text, said corpus comprising a collection of relevant character combinations and a collection of relevant sequences of predetermined character combinations according to a hierarchical evaluation scheme, wherein respective probability values are stored for items in said collections for a best match speech item selection, reflecting the most probable language-specific use.
  - 10. The method according to claim 9, wherein the input text is subjected to sentence phrase level mapping, followed by phrase level mapping, followed by word level mapping, followed by character level mapping.
  - 11. The method according to claim 9, wherein said best match speech item selection is performed using a longest match algorithm.
  - 12. The method of claim 1 further comprising:
    - subjecting the enriched input text to said grapheme-phoneme conversion to produce a phonetic description of said input text; and
      
      converting said phonetic description into synthetic speech.
  - 13. The method of claim 1, wherein the at least some input text on which the first statistical method was applied comprises all of the input text on which the first statistical method was applied.

14. A method for training a speech recognizer with an input text given in an incomplete language and corresponding speech data, the method comprising:
- enriching an input word of said input text given in an incomplete language with missing information using at least one processor programmed to implement a first statistical method configured to operate on a first type of linguistic unit and a second statistical method configured to operate on a second type of linguistic unit, wherein the enriching comprises applying the first statistical method to the input text to generate an intermediate result, and after the first statistical method has been applied, applying the second statistical method to at least a portion of the intermediate result to generate an enriched representation of the input text;
  
  subjecting the enriched representation of the input text to grapheme-phoneme conversion to produce a phonetic description of said input text; and
  
  using said phonetic description to train at least one acoustic model to recognize words from said input text.
- View Dependent Claims (15)
- - 15. The method of claim 14, wherein the at least one acoustic model is a Hidden Markov Model.

16. A computer system, comprising:
- at least one processor programmed to;
  
  enrich an input text given in an incomplete language, with missing information using a first statistical method configured to operate on a first type of linguistic unit and a second statistical method configured to operate on a second type of linguistic unit, wherein the enriching comprises applying the first statistical method to the input text to generate an intermediate result, and after the first statistical method has been applied, applying the second statistical method to at least a portion of the intermediate result to generate an enriched representation of the input text; and
  
  convert the enriched representation of the input text into speech.
- View Dependent Claims (17)
- - 17. The computer system according to claim 16, wherein the computer system is operable as a voice server computer system connectable in an electronic network and/or a telephony network by a network interface, wherein the voice server computer system is bi-directionally connected with a client voice browser and/or phone, wherein said voice server computer system further comprises:
    - a voice browser for rendering acoustic, textual input, and/or output information produced by a TTS engine and/or a speech recognition engine; and
      
      an application programming interface for filtering information input to said TTS engine and/or said speech recognition engine.

18. A text server computer system, comprising:
- at least one processor programmed to;
  
  train a speech recognizer with an input text given in an incomplete language and corresponding speech data; and
  
  enrich an input word of said input text with missing information using a first statistical method configured to operate on a first type of linguistic unit and a second statistical method configured to operate on a second type of linguistic unit, wherein the enriching comprises applying the first statistical method to the input text to generate an intermediate result, and after the first statistical method has been applied, applying the second statistical method to at least a portion of the intermediate result to generate an enriched representation of the input text.

19. A non-transitory computer usable medium, encoded with a plurality of instructions-that, when executed by a computer, perform a method of supplementing an input text given in an incomplete language with missing information, the method comprising:
- enriching said input text with the missing information using a first statistical method configured to operate on a first type of linguistic unit and a second statistical method configured to operate on a second type of linguistic unit, wherein the enriching comprises applying the first statistical method to the input text to generate an intermediate result, and after the first statistical method has been applied, applying the second statistical method to at least a portion of the intermediate result to generate an enriched representation of the input text.
- View Dependent Claims (20)
- - 20. The computer usable medium according to claim 19, wherein the method further comprises:
    - subjecting the enriched input text to grapheme-phoneme conversion to produce a phonetic description of said input text; and
      
      converting said phonetic description into synthetic speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation, Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Fischer, Volker, Emam, Ossama
Primary Examiner(s)
JACKSON, JAKIEDA R

Application Number

US10/948,443
Publication Number

US 20050192807A1
Time in Patent Office

2,623 Days
Field of Search

704/255, 704/256, 704/9
US Class Current

704/256
CPC Class Codes

G06F 40/232 Orthographic correction, e....

Hierarchical approach for the statistical vowelization of Arabic text

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Hierarchical approach for the statistical vowelization of Arabic text

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links