Method and system for automatically generating new voice files corresponding to new text from a script

US 5,737,725 A
Filed: 01/09/1996
Issued: 04/07/1998
Est. Priority Date: 01/09/1996
Status: Expired due to Term

First Claim

Patent Images

1. A method for automatically generating a new voice file from a script having at least one new speech pattern and a plurality of known speech patterns each having a known phonetic sequence associated therewith, the script having a text input and an audio input, the method comprising:

translating the text input to obtain a corresponding textual phonetic sequence in order to identify at least one new textual phonetic sequence corresponding to the at least one new speech pattern based on the plurality of known speech patterns;

translating the audio input to obtain a corresponding audio phonetic sequence;

aligning the text input and the corresponding textual phonetic sequence with the audio input and the corresponding audio phonetic sequence to obtain an alignment of the text input and the audio input of the script to identify at least one new audio phonetic sequence corresponding to the new at least one speech pattern; and

generating the new voice file containing a portion of the audio input based on the at least one new audio phonetic sequence.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for automatically generating at least one new voice file corresponding to at least one new text from a script incorporating a plurality of known text having corresponding preexisting voice files associated therewith. A plurality of phonetic sequences corresponding to the plurality of known text is stored in a first memory. A text input corresponding to a textual version of the script is provided and a text-to-phonetic translator translates the text input to obtain a corresponding textual phonetic sequence based on the plurality of phonetic sequences stored in the first memory. An audio input of the script is provided and a speech recognizer generates an audio phonetic sequence of the audio input. A text-to-speech aligner aligns the text input and the corresponding textual phonetic sequence with the audio input and the corresponding audio phonetic sequence to obtain an alignment of the text input and the audio input. The at least one new voice file is generated based on the alignment. The at least one new voice file may be stored in a second memory with the plurality of pre-existing voice files for use with a concatenated voice playback system.

130 Citations

67 Claims

1. A method for automatically generating a new voice file from a script having at least one new speech pattern and a plurality of known speech patterns each having a known phonetic sequence associated therewith, the script having a text input and an audio input, the method comprising:
- translating the text input to obtain a corresponding textual phonetic sequence in order to identify at least one new textual phonetic sequence corresponding to the at least one new speech pattern based on the plurality of known speech patterns;
  
  translating the audio input to obtain a corresponding audio phonetic sequence;
  
  aligning the text input and the corresponding textual phonetic sequence with the audio input and the corresponding audio phonetic sequence to obtain an alignment of the text input and the audio input of the script to identify at least one new audio phonetic sequence corresponding to the new at least one speech pattern; and
  
  generating the new voice file containing a portion of the audio input based on the at least one new audio phonetic sequence.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 6. The method as recited in claim 1 further comprising the step of editing the new voice file according to a predetermined set of rules.
  - 7. The method as recited in claim 6 wherein the predetermined set of rules include reducing a level of at least one breath sound of the new voice file by a predetermined amount.
  - 8. The method as recited in claim 6 wherein the predetermined set of rules include editing the new voice file at a zero crossing.
  - 9. The method as recited in claim 6 wherein the predetermined set of rules include editing the new voice file in an inactive portion of the new voice file.
  - 10. The method as recited in claim 6 wherein the predetermined set of rules include editing the new voice file in an active portion of the new voice file.
  - 11. The method as recited in claim 10 wherein the predetermined set of rules include editing a beginning of the new voice file at a zero crossing in a zero to a positive direction.
  - 12. The method as recited in claim 11 wherein the predetermined set of rules include editing an ending of the new voice file at a zero crossing in a negative to a zero direction.
  - 13. The method as recited in claim 10 wherein the predetermined set of rules include editing a beginning of the new voice file at a zero crossing in a zero to a negative direction.
  - 14. The method as recited in claim 13 wherein the predetermined set of rules include editing an ending of the new voice file at a zero crossing in a positive to a zero direction.
  - 15. The method as recited in claim 6 wherein the predetermined set of rules include editing the new voice file at a predetermined amount of time before a beginning of the new voice file.
  - 16. The method as recited in claim 6 wherein the predetermined set of rules include editing the new voice file at a predetermined amount of time after an ending of the new voice file.
  - 17. The method as recited in claim 1 further comprising the step of storing the new voice file in a voice file memory.
  - 18. The method as recited in claim 17 wherein the step of storing includes the step of assigning an identifier to the new voice file.
  - 19. The method as recited in claim 1 further comprising the step of concatenating the new voice file with a selected portion of the plurality of known speech patterns to obtain a natural sounding voice message.
  - 20. The method as recited in claim 1 wherein the step of translating the audio input includes the step of recording the script to obtain a voice recording.
  - 21. The method as recited in claim 1 wherein the step of translating the audio input includes the step of providing a television audio signal and wherein the step of translating the text input includes the step of providing a closed caption decoding of the television audio signal.

2. The method as recited in claim wherein the step of translating the text input includes the step of marking the at least one new speech pattern.
- View Dependent Claims (3, 4, 5)
- - 3. The method as recited in claim 2 wherein the step of marking includes the step of comparing the text input with the known phonetic sequences.
  - 4. The method as recited in claim 2 further comprising the step of adding at least one new textual phonetic sequence corresponding to the at least one new speech pattern to a memory.
  - 5. The method as recited in claim 2 wherein the step of generating includes the step of comparing the marked at least one new speech pattern with the aligned audio input.

22. A method for automatically generating at least one new voice file corresponding to at least one new text from a script incorporating a plurality of known text having corresponding pre-existing voice files associated therewith, the method comprising:
- storing a plurality of phonetic sequences corresponding to the plurality of known text in a first memory;
  
  providing a text input corresponding to a textual version of the script;
  
  translating the text input to obtain a corresponding textual phonetic sequence based on the plurality of phonetic sequences stored in the first memory;
  
  comparing the text input with the plurality of phonetic sequences stored in the first memory;
  
  marking the at least one new text;
  
  adding at least one new textual phonetic sequence corresponding to the at least one new text in the first memory, the at least one new textual phonetic transcript corresponding to the audio phonetic transcript of the at least one new text;
  
  providing an audio input corresponding to an audio version of the script;
  
  generating an audio phonetic sequence of the audio input by comparing the marked at least one new text with the aligned audio input;
  
  aligning the text input and the corresponding textual phonetic sequence with the audio input and the corresponding audio phonetic sequence to obtain an alignment of the text input and the audio input of the script;
  
  generating the at least one new voice file based on the alignment; and
  
  editing the at least one new voice file according to a predetermined set of rules, including reducing a level of at least one breath sound of the at least one new voice file by a predetermined amount.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 48, 49, 50)
- - 23. The method as recited in claim 22 wherein the predetermined set of rules include editing the at least one new voice file at a zero crossing.
  - 24. The method as recited in claim 22 wherein the predetermined set of rules include editing the at least one new voice file in an inactive portion of the at least one new voice file.
  - 25. The method as recited in claim 22 wherein the predetermined set of rules include editing the at least one new voice file in an active portion of the at least one new voice file.
  - 26. The method as recited in claim 25 wherein the predetermined set of rules include editing a beginning of the at least one new voice file at a zero crossing in a zero to a positive direction.
  - 27. The method as recited in claim 26 wherein the predetermined set of rules include editing an ending of the at least one new voice file at a zero crossing in a negative to a zero direction.
  - 28. The method as recited in claim 25 wherein the predetermined set of rules include editing a beginning of the at least one new voice file at a zero crossing in a zero to a negative direction.
  - 29. The method as recited in claim 28 wherein the predetermined set of rules include editing an ending of the at least one new voice file at a zero crossing in a positive to a zero direction.
  - 30. The method as recited in claim 22 wherein the predetermined set of rules include editing the at least one new voice file at a predetermined amount of time before a beginning of the at least one new voice file.
  - 31. The method as recited in claim 22 wherein the predetermined set of rules include editing the at least one new voice file at a predetermined amount of time after ending of the at least one new voice file.
  - 48. The system as recited as recited in claim 22 further comprising a voice file memory for storing the new voice file.
  - 49. The system as recited in claim 48 further comprising means for assigning an identifier to the new voice file.
  - 50. The system as recited in claim 48 further comprising means for concatenating the new voice file with a selected portion of the plurality of known speech patterns to obtain a natural sounding voice message.

32. A system for automatically generating new voice file from a script having at least one new speech pattern and a plurality of known speech patterns each having a known phonetic sequence associated therewith, the script having a text input and an audio input, the system comprising:
- means for translating the text input to obtain a corresponding textual phonetic sequence in order to identify at least one new textual phonetic sequence corresponding to the at least one new speech pattern based on the plurality of known speech patterns;
  
  means for translating the audio input to obtain a corresponding audio phonetic sequence;
  
  means for aligning the text input and the corresponding textual phonetic sequence with the audio input and the corresponding audio phonetic sequence to obtain an alignment of the text input and the audio input to identify at least one new audio phonetic sequence corresponding to the new speech pattern; and
  
  means for generating the new voice file containing a portion of the audio input based on the at least one new audio phonetic sequence.
- View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 51, 52, 53, 54, 55, 56, 57)
- - 33. The system as recited in claim 32 wherein the means for translating the text input further includes means for marking the at least one new speech pattern.
  - 34. The system as recited in claim 33 wherein the means for translating the text input compares the text input with the known phonetic sequences.
  - 35. The system as recited in claim 33 further comprising means for adding at least one new textual phonetic sequence corresponding to the at least one new speech pattern to a memory.
  - 36. The system as recited in claim 33 wherein the means for generating includes means for comparing the marked at least one new speech pattern with the aligned audio input.
  - 37. The system as recited in claim 32 further comprising means for editing the new voice file according to a predetermined set of rules.
  - 38. The system as recited in claim 37 wherein the predetermined set of rules include reducing a level of at least one breath sound of the new voice file by a predetermined amount.
  - 39. The system as recited in claim 37 wherein the predetermined set of rules include editing the new voice file at a zero crossing.
  - 40. The system as recited in claim 37 wherein the predetermined set of rules include editing the new voice file in an inactive portion of the new voice file.
  - 41. The system as recited in claim 37 wherein the predetermined set of rules include editing the new voice file in an active portion of the new voice file.
  - 42. The method as recited in claim 41 wherein the predetermined set of rules include editing a beginning of the new voice file at a zero crossing in a zero to a positive direction.
  - 43. The method as recited in claim 42 wherein the predetermined set of rules include editing an ending of the new voice file at a zero crossing in a negative to a zero direction.
  - 44. The method as recited in claim 41 wherein the predetermined set of rules include editing a beginning of the new voice file at a zero crossing in a zero to a negative direction.
  - 45. The method as recited in claim 44 wherein the predetermined set of rules editing an ending of the new voice file at a zero crossing in a positive to a zero direction.
  - 46. The system as recited in claim 37 wherein the predetermined set of rules include editing the new voice file at a predetermined amount of time before a beginning of the new voice file.
  - 47. The system as recited in claim 37 wherein the predetermined set of rules include editing the new voice file at a predetermined amount of time after an ending of the new voice file.
  - 51. The system as recited in claim 32 wherein the means for translating the audio input include means for recording the script to obtain a voice recording.
  - 52. The system as recited in claim 51 wherein the means for recording in a microphone.
  - 53. The system as recited in claim 32 wherein the audio input is a television audio signal and wherein the text input is a closed caption decoding of the television audio signal.
  - 54. The system as recited in claim 32 wherein the means for translating text input is a text-to-phonetic translator.
  - 55. The system as recited in claim 32 wherein the means for translating the audio input is a speech recognizer.
  - 56. The system as recited in claim 32 wherein the means for aligning is a text-to-speech aligner.
  - 57. The system as recited in claim 32 wherein the means for translating the audio input and the means for aligning is a text-to-speech aligner.

58. A system for automatically generating at least one new voice file corresponding to at least one new text from a script incorporating a plurality of known text having corresponding pre-existing voice files associated therewith, the system comprising:
- first memory for storing a plurality of phonetic sequences corresponding to the plurality of known text;
  
  means for providing a text input corresponding to a textual version of the script;
  
  means for translating text input to obtain a corresponding textual phonetic sequence based on a comparison of the textual version of the script with the plurality of phonetic sequences stored in the first memory;
  
  means for marking the at least one new text;
  
  means for adding at least one new textual phonetic sequence corresponding to the at least one new text in the first memory, the new textual phonetic transcript corresponding to the audio phonetic transcript of the at least one new text;
  
  means for providing an audio input corresponding to an audio version of the spirit;
  
  first means for generating an audio phonetic sequence of the audio input;
  
  means for aligning the text input and the corresponding textual phonetic sequence with the audio input and the corresponding audio phonetic sequence to obtain an alignment of the text input and the audio input;
  
  second means for generating the at least one new voice file based on the alignment based on a comparison of the marked at least one new text with the aligned audio input; and
  
  means for editing the at least one new voice file according to a predetermined set of rules, including reducing a level of at least one breath sound of the at least one new voice file by a predetermined amount.
- View Dependent Claims (59, 60, 61, 62, 63, 64, 65, 66, 67)
- - 59. The system as recited in claim 58 wherein the predetermined set of rules include editing the new voice file at a zero crossing.
  - 60. The system as recited in claim 58 wherein the predetermined set of rules include editing the new voice file in an inactive portion of the new voice file.
  - 61. The system as recited in claim 58 wherein the predetermined set of rules include editing the new voice file in an active portion of the new voice file.
  - 62. The system as recited in claim 61 wherein the predetermined set of rules include editing a beginning of the at least one new voice file at a zero crossing in a zero to a positive direction.
  - 63. The system as recited in claim 62 wherein the predetermined set of rules include editing an ending of the at least one new voice file at a zero crossing in a negative to a zero direction.
  - 64. The system as recited in claim 61 wherein the predetermined set of rules include editing a beginning of the at least one new voice file at a zero crossing in a zero to a negative direction.
  - 65. The system as recited in claim 64 wherein the predetermined set of rules include editing an ending of the at least one new voice file at a zero crossing in a positive to a zero direction.
  - 66. The system as recited in claim 58 wherein the predetermined set of rules include editing the new voice file at a predetermined amount of time before a beginning of the new voice file.
  - 67. The system as recited in claim 58 wherein the predetermined set of rules include editing the new voice file at predetermined amount of time after an ending of the new voice file.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qwest Communications International Incorporated (Lumen Technologies, Inc.)
Original Assignee
U S West Marketing Resources Group, Inc.
Inventors
Case, Eliot M.
Primary Examiner(s)
Tung, Kee M.

Application Number

US08/584,649
Time in Patent Office

819 Days
Field of Search

395/2.69, 395/2.79, 395/2.86, 395/2.87, 395/2.67, 395/2.22, 395/2.46, 395/226, 395/227
US Class Current

704/260
CPC Class Codes

G06Q 30/0601   Electronic shopping [e-shop...

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/07   Concatenation rules

G10L 13/08   Text analysis or generation...

Method and system for automatically generating new voice files corresponding to new text from a script

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

130 Citations

67 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for automatically generating new voice files corresponding to new text from a script

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

130 Citations

67 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links