Method and apparatus for word pronunciation composition

US 7,099,828 B2
Filed: 11/07/2001
Issued: 08/29/2006
Est. Priority Date: 11/07/2001
Status: Expired due to Term

First Claim

Patent Images

1. A computer-implemented method for composing a pronunciation of a portion of text by generating pronunciation information, the method comprising:

graphically displaying a first set of activatable visual identifiers, wherein the visual identifiers of said first set are simultaneously displayed in a single display, wherein each of the visual identifiers of said first set uniquely correspond to one of a plurality of phonemes, and wherein each visual identifier of said first set has a label that identifies the corresponding phoneme and that provides an explanatory word representing a sound of the corresponding phoneme;

graphically displaying a second set of activatable visual identifiers simultaneously with said first set of activatable visual identifiers, wherein each of the activatable visual identifiers of the second set uniquely corresponding to one of a plurality of prosodic parameters and has a label identifying the corresponding prosodic parameter;

graphically displaying a third set of activatable visual identifiers simultaneously with said first and second sets of activatable visual identifiers, wherein each of the activatable visual identifiers of the third set uniquely corresponds to one of a plurality or pronunciation stress parameters and has a label identifying the corresponding pronunciation stress parameter;

graphically displaying a fourth set of activatable visual identifiers simultaneously with said first, second, and third sets of activatable visual identifiers, wherein the visual identifiers of said fourth set are simultaneously displayed in said single display, and wherein each of the visual identifiers of the fourth set corresponds to one of a set of actions, said set of actions comprising an adding action, a removing action, and a reordering action for adding, removing, and reordering phonemes, prosodic parameters, and pronunciation parameters to a pronunciation presented in the single display when at least one activatable identifier of the first, second, and third sets is activated in combination with one of the visual identifiers of the fourth set;

responsive to a selection of at least one of said visual identifiers, generating said pronunciation information in accordance with said selected visual identifier, said pronunciation information comprising at least one of a phoneme selected from said plurality of phonemes, an ordering of selected phonemes, a pronunciation stress parameter, and a prosodic parameter;

enabling a user to compose said pronunciation by selectively performing at least one of adding a particular one of the plurality of phonemes, prosodic parameters, and pronunciation stress parameters, removing a particular one of the plurality of phonemes, prosodic parameters, and pronunciation stress parameters, and reordering at least two phonemes by activating at least two activatable visual parameters, said user'"'"'s selection being based upon said pronunciation information and based upon at least one of an audible rendering of a portion of said pronunciation during said user'"'"'s composing said pronunciation and without compiling said pronunciation information, an audible rendering of an exemplary word illustrative of a particular phoneme, and a visual rendering of an exemplary word illustrative of the particular phoneme; and

compiling said pronunciation information responsive to a selection of one of said plurality of visual identifiers.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of generating pronunciation information can include graphically presenting at least one activatable visual identifier corresponding to individual ones of a plurality of phonemes. Responsive to a selection of one of the visual identifiers, pronunciation information can be generated in accordance with the selected visual identifier. The pronunciation information can be compiled responsive to a selection of one of the plurality of visual identifiers.

25 Citations

View as Search Results

23 Claims

1. A computer-implemented method for composing a pronunciation of a portion of text by generating pronunciation information, the method comprising:
- graphically displaying a first set of activatable visual identifiers, wherein the visual identifiers of said first set are simultaneously displayed in a single display, wherein each of the visual identifiers of said first set uniquely correspond to one of a plurality of phonemes, and wherein each visual identifier of said first set has a label that identifies the corresponding phoneme and that provides an explanatory word representing a sound of the corresponding phoneme;
  
  graphically displaying a second set of activatable visual identifiers simultaneously with said first set of activatable visual identifiers, wherein each of the activatable visual identifiers of the second set uniquely corresponding to one of a plurality of prosodic parameters and has a label identifying the corresponding prosodic parameter;
  
  graphically displaying a third set of activatable visual identifiers simultaneously with said first and second sets of activatable visual identifiers, wherein each of the activatable visual identifiers of the third set uniquely corresponds to one of a plurality or pronunciation stress parameters and has a label identifying the corresponding pronunciation stress parameter;
  
  graphically displaying a fourth set of activatable visual identifiers simultaneously with said first, second, and third sets of activatable visual identifiers, wherein the visual identifiers of said fourth set are simultaneously displayed in said single display, and wherein each of the visual identifiers of the fourth set corresponds to one of a set of actions, said set of actions comprising an adding action, a removing action, and a reordering action for adding, removing, and reordering phonemes, prosodic parameters, and pronunciation parameters to a pronunciation presented in the single display when at least one activatable identifier of the first, second, and third sets is activated in combination with one of the visual identifiers of the fourth set;
  
  responsive to a selection of at least one of said visual identifiers, generating said pronunciation information in accordance with said selected visual identifier, said pronunciation information comprising at least one of a phoneme selected from said plurality of phonemes, an ordering of selected phonemes, a pronunciation stress parameter, and a prosodic parameter;
  
  enabling a user to compose said pronunciation by selectively performing at least one of adding a particular one of the plurality of phonemes, prosodic parameters, and pronunciation stress parameters, removing a particular one of the plurality of phonemes, prosodic parameters, and pronunciation stress parameters, and reordering at least two phonemes by activating at least two activatable visual parameters, said user'"'"'s selection being based upon said pronunciation information and based upon at least one of an audible rendering of a portion of said pronunciation during said user'"'"'s composing said pronunciation and without compiling said pronunciation information, an audible rendering of an exemplary word illustrative of a particular phoneme, and a visual rendering of an exemplary word illustrative of the particular phoneme; and
  
  compiling said pronunciation information responsive to a selection of one of said plurality of visual identifiers.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, said generating step comprising:
    - identifying at least one phoneme associated with said selected visual identifier and inserting said identified at least one phoneme into said pronunciation information based on an audible rendering of a portion of said pronunciation during said user'"'"'s composing said pronunciation and without compiling said pronunciation information.
  - 3. The method of claim 1, said generating step comprising:
    - identifying at least one phoneme associated with said selected visual identifier and removing said identified at least one phoneme from said pronunciation information based on an audible rendering of a portion of said pronunciation during said user'"'"'s composing said pronunciation and without compiling said pronunciation information.
  - 4. The method of claim 1, wherein said pronunciation information comprises a plurality of phonemes, said generating step comprising:
    - reordering said plurality of phonemes of said pronunciation information.
  - 5. The method of claim 1, wherein said generating step comprises:
    - changing said pronunciation information by changing at least one of a phoneme selected from said plurality of phonemes, an ordering of selected phonemes, a pronunciation stress parameter, and a prosodic parameter.
  - 6. The method of claim 1, wherein said pronunciation information comprises a stress parameter and a prosodic parameter.
  - 7. The method of claim 1, further comprising:
    - playing an audio approximation of said pronunciation information responsive to a selection of one of said plurality of visual identifiers.
  - 8. The method of claim 1, wherein said plurality of phonemes includes phonemes from at least two languages.
  - 9. The method of claim 1, further comprising:
    - storing said pronunciation information in a memory.

10. A pronunciation composition tool comprising:
- a library comprising a plurality of phonemes;
  
  a graphical user interface comprising a plurality of activatable visual identifiers, wherein said graphical user interface is configured to graphically display simultaneouslya first set of activatable visual identifiers, wherein each of the visual identifiers of said first set uniquely correspond to one of a plurality of phonemes, and wherein each visual identifier of said first set has a label that identifies the corresponding phoneme and that provides an explanatory word representing a sound of the corresponding phoneme;
  
  a second set of activatable visual identifiers, wherein each of the activatable visual identifiers of the second set uniquely corresponds to one of a plurality of prosodic parameters and has a label identifying the corresponding prosodic parameter;
  
  a third set of activatable visual identifiers, wherein each of the activatable visual identifiers of the third set uniquely corresponds to one of a plurality of pronunciation stress parameters and has a label identifying the corresponding pronunciation stress parameter, anda fourth set of activatable visual identifiers, wherein the visual identifiers of aid fourth set are simultaneously displayed in said single display, and wherein each of the visual identifiers of the fourth set corresponds to one of a set of actions, said set of actions comprising an adding action, a removing action, and a reordering action for adding, removing, and reordering phonemes, prosodic parameters, and pronunciation parameters to a pronunciation presented in the single display when at least one activatable identifier of the first, second, and third sets is activated in combination with one of the visual identifiers of the fourth set; and
  
  a processor configured to generate pronunciation information by including selected ones of said plurality of phonemes from said library responsive to a selection of at least one of said activatable visual identifiers and by enabling a user to compose said pronunciation by selectively causing said processor to perform at least one operation of adding a particular one of the plurality of phonemes and removing a particular one of the plurality of phonemes, said user causing said processor to perform at least one operation based upon said pronunciation information and at least one of an audible rendering of a portion of said pronunciation during said use'"'"'s composing said pronunciation and without compiling said pronunciation information, an audible rendering of an exemplary word illustrative of a particular phoneme, and a visual rendering of an exemplary word illustrative of the particular phoneme.
- View Dependent Claims (11, 12, 13, 14)
- - 11. The pronunciation tool of claim 10, farther comprising:
    - a text-to-speech system configured to play an audio approximation of said pronunciation information responsive to activation of one of said activatable visual identifiers.
  - 12. The pronunciation composition tool of claim 10, further comprising:
    - a compiler configured to compile said pronunciation information for use with a speech driven application.
  - 13. The pronunciation composition tool of claim 10, wherein said processor is further configured to modify said pronunciation information.
  - 14. The pronunciation tool of claim 10, wherein said plurality of phonemes comprise phonemes corresponding to at least two languages.

15. A machine-readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
- graphically displaying a first set of activatable visual identifiers, wherein the visual identifiers of said first set are simultaneously displayed in a single display, wherein each of the visual identifiers of said first set uniquely correspond to one of a plurality of phonemes, and wherein each visual identifier of said first set has a label that identifies the corresponding phoneme and that provides an explanatory word representing a sound of the corresponding phoneme;
  
  graphically displaying a second set of activatable visual identifiers simultaneously with said first set of activatable visual identifiers, wherein each of the activatable visual identifiers of the second set uniquely corresponds to one of a plurality of prosodic parameters and has a label identifying the corresponding prosodic parameter;
  
  graphically displaying a third set of activatable visual identifiers simultaneously with said first and second sets of activatable visual identifiers, wherein each of the activatable visual identifiers of the third set uniquely corresponds to one of a plurality of pronunciation stress parameters and has a label identifying the corresponding pronunciation stress parameter;
  
  graphically displaying a fourth set of activatable visual identifiers simultaneously with said first, second, and third sets of activatable visual identifiers, wherein the visual identifiers of said fourth set are simultaneously displayed in said single display, and wherein each of the visual identifiers of the fourth set corresponds to one of a set of actions, said set of actions comprising an adding action, a removing action, and a reordering action for adding, removing, and reordering phonemes, prosodic parameters, and pronunciation parameters to a pronunciation presented in the single display when at least one activatable identifier of the first, second, and third sets is activated in combination with one of the visual identifiers of the fourth set;
  
  responsive to a selection of at least one of said visual identifiers, generating said pronunciation information in accordance with said selected visual identifier, said pronunciation information comprising at least one of a phoneme selected from said plurality of phonemes, an ordering of selected phonemes, a pronunciation stress parameter, and a prosodic parameter,enabling a user to compose said pronunciation by selectively performing at least one of adding a particular one of the plurality of phonemes, prosodic parameters, and pronunciation stress parameters, removing a particular one of the plurality of phonemes, prosodic parameters, and pronunciation stress parameters, and reordering at least two phonemes by activating at least two activatable visual parameters, said user'"'"'s selection being based upon said pronunciation information and based upon at least one of an audible rendering of a portion of said pronunciation during said user'"'"'s composing said pronunciation and without compiling said pronunciation information, an audible rendering of an exemplary word illustrative of a particular phoneme, and a visual rendering of an exemplary word illustrative of the particular phoneme; and
  
  compiling said pronunciation information responsive to a selection of one of said plurality of visual identifiers.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23)
- - 16. The machine-readable storage of claim 15, saidgenerating step comprising:
    - identifying at least one phoneme associated with said selected visual identifier and inserting said identified at least one phoneme into said pronunciation information based on an audible rendering of a portion of said pronunciation during said user'"'"'s composing said pronunciation and without compiling said pronunciation information.
  - 17. The machine-readable storage of claim 15, saidgenerating step comprising:
    - identifying at least one phoneme associated with said selected visual identifier and removing said identified at least one phoneme from said pronunciation information based on an audible rendering of a portion of said pronunciation during said user'"'"'s composing said pronunciation and without compiling said pronunciation information.
  - 18. The machine-readable storage of claim 15, wherein said pronunciation information comprises a plurality of phonemes, said generating step comprising:
    - reordering said plurality of phonemes of said pronunciation information.
  - 19. The machine-readable storage of claim 15, wherein said modifying step comprises:
    - changing said pronunciation information by changing at least one of a phoneme selected from said plurality of phonemes, an ordering of selected phonemes, a pronunciation stress parameter, and a prosodic parameter.
  - 20. The machine-readable storage of claim 15, wherein said pronunciation information comprises a stress parameter and a prosodic parameter.
  - 21. The machine-readable storage of claim 15, further comprising:
    - playing an audio approximation of said pronunciation information responsive to a selection of one of said plurality of visual identifiers.
  - 22. The machine-readable storage of claim 15, wherein said plurality of phonemes include phonemes from at least two languages.
  - 23. The machine-readable storage of claim 15, further comprising:
    - storing said pronunciation information in a memory.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Reich, David E., Kobal, Jeffrey S., Lucas, Bruce D.
Primary Examiner(s)
Hudspeth, David
Assistant Examiner(s)
ALBERTALLI, BRIAN LOUIS

Application Number

US10/007,615
Publication Number

US 20030088415A1
Time in Patent Office

1,756 Days
Field of Search

704/270, 704/258, 704/243
US Class Current

704/270
CPC Class Codes

G10L 15/187 Phonemic context, e.g. pron...

G10L 15/22 Procedures used during a sp...

Method and apparatus for word pronunciation composition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

25 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for word pronunciation composition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

25 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links