Hierarchical approach for the statistical vowelization of Arabic text

US 20050192807A1
Filed: 09/23/2004
Published: 09/01/2005
Est. Priority Date: 02/26/2004
Status: Active Grant

First Claim

Patent Images

1. A method for converting an input text given in an incomplete language into speech, wherein a computer-aided graphem-phonem conversion is used, characterized by the steps of:

a) using statistical methods for enriching said input text with missing information, b) subjecting the enriched input text to said grapheme-phoneme conversion to produce a phonetic description of said input text, c) converting said phonetic description into synthetic speech.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to the field of computer-aided text and speech processing, and in particular to a method and respective system for converting an input text given in an incomplete language, for example a language, in which unvowelized text is used, into speech, wherein a computer-aided grapheme-phoneme conversion is used. In order to improve completion of the text, it is proposed to a) use statistical methods including decision trees and stochastic language models for enriching, i.e. completing said input text with missing information—which may be desired for a full understanding of the input text b) subjecting the completed input text to said grapheme-phoneme conversion to produce synthetic speech.

Advantageously, the text is completed according to a model hierarchy giving higher priority to longer chunks of text, ie sentences (310, 315, 320) then multiword phrases (330, 335, 340), then words (350, 355, 360) and finally character groups (370, 375, 380, 390).

Citations

17 Claims

1. A method for converting an input text given in an incomplete language into speech, wherein a computer-aided graphem-phonem conversion is used, characterized by the steps of:
- a) using statistical methods for enriching said input text with missing information, b) subjecting the enriched input text to said grapheme-phoneme conversion to produce a phonetic description of said input text, c) converting said phonetic description into synthetic speech.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 16, 17)
- - 3. The method according to claim 1, wherein said statistical methods include applying stochastic language models and/or decision trees.
  - 4. The method according to claim 1, wherein the incompleteness of the text is due to the lack of vowels, and wherein vowelized text is generated.
  - 5. The method according to claim 1, comprising the steps of:
    - a) evaluating a given enrichment corpus for completion of said input text comprised of a language-specific characters of said incomplete language, said corpus comprising a collection of relevant character combinations and a collection of relevant sequences of predetermined character combinations according to a hierarchical evaluation scheme, wherein respective probability values are stored for the members of said collections for a best match speech item selection, reflecting the most probable language-specific use.
  - 6. The method according to the preceding claim, wherein a given input is first subjected to sentence phrase level mapping, followed by phrase level mapping, followed by word level mapping, followed by character level mapping.
  - 7. The method according to claim 5, wherein said selection is done using a longest match algorithm.
  - 8. The method according to claim 3, wherein said stochastic language models include N-gram language models, N being preferably of the range N=1, . . . 5.
  - 9. The method according to claim 4, wherein the incomplete language is a Semitic language.
  - 10. The method according to the preceding claim, further comprising the steps of:
    - a) testing if the phonetic description of a completed text element follows a language-specific syllable structure, b) if not, selecting a different vowelized text.
  - 11. The method according to claim 3, further comprising the step of updating dictionaries and/or stochastic language models in use with newly found words, phrases or sentences.
  - 12. The method according to claim 3, further comprising the step of updating dictionaries and/or stochastic language models in use with a feedback information reflecting the actual use of selected words, phrases or sentences.
  - 16. A computer program for execution in a data processing system comprising computer program code portions for performing respective steps of the method according to anyone of the preceding claims 1, when said computer program code portions are executed on a computer.
  - 17. A computer program product stored on a computer usable medium comprising computer readable program means for causing a computer to perform the method of anyone of the claims 1, when said computer program product is executed on a computer.

2. A method for training a speech recognizer with an input text and corresponding speech data, wherein said input text is given in an incomplete language, characterized by the steps of:
- a) using statistical methods for enriching an input word of said input text with missing information, b) subjecting the enriched input text to said grapheme-phoneme conversion to produce a phonetic description of said input text, c) training acoustic Hidden Markov Models for the recognition of words from said input text.

13. A computer system having a functional component for converting an input text given in an incomplete language into speech, wherein a computer-aided graphem-phonem conversion is used, characterized by comprising a functional vowelization program component using statistical methods for enriching said input text with missing information and having access to:
- a) a database comprising language models for words or characters and/or for classes of words or characters, b) a database comprising language models for sentences and/or phrases.
- View Dependent Claims (14)
- - 14. The computer system according to claim 13, being arranged for being operated as a voice server computer system connectable in an electronic and/or telephony network by respective network interface means and for being cooperated bi-directionally with a client voice browser or phone, wherein said voice server computer system further comprises a voice browser for rendering acoustic or textual input or output information produced by a TTS engine or a speech recognition engine, respectively, and an application programming interface for filtering the input information to said engines.

15. A text server computer system having a functional component for training a speech recognizer with an input text and corresponding speech data, wherein said input text is given in an incomplete language, characterized by comprising a functional vowelization program component using statistical methods for enriching an input word of said input text with missing information and having access to:
- a) a database comprising language models for words or characters and/or for classes of words or characters, b) a database comprising language models for sentences and/or phrases.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Fischer, Volker, Emam, Ossama

Granted Patent

US 8,069,045 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G06F 40/232 Orthographic correction, e....

Hierarchical approach for the statistical vowelization of Arabic text

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Hierarchical approach for the statistical vowelization of Arabic text

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links