PHONETIC DECODING AND CONCATENTIVE SPEECH SYNTHESIS

US 20080133241A1
Filed: 11/15/2007
Published: 06/05/2008
Est. Priority Date: 11/30/2006
Status: Active Grant

First Claim

Patent Images

1. A speech processing system for receiving speech data from a speaker during a conversation turn in a conversation session, said speech processing system comprising:

a phoneme recognition engine for converting the received speech data to an input string of acoustic data;

a phoneme modification engine for changing at least one item of acoustic data in said input string according to one or more rules to form at least one output string of acoustic data; and

a phoneme speech engine for converting each formed output string of acoustic data to output speech data for output to at least one listener.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech processing system includes a multiplexer that receives speech data input as part of a conversation turn in a conversation session between two or more users where one user is a speaker and each of the other users is a listener in each conversation turn. A speech recognizing engine converts the speech data to an input string of acoustic data while a speech modifier forms an output string based on the input string by changing an item of acoustic data according to a rule. The system also includes a phoneme speech engine for converting the first output string of acoustic data including modified and unmodified data to speech data for output via the multiplexer to listeners during the conversation turn.

224 Citations

17 Claims

1. A speech processing system for receiving speech data from a speaker during a conversation turn in a conversation session, said speech processing system comprising:
- a phoneme recognition engine for converting the received speech data to an input string of acoustic data;
  
  a phoneme modification engine for changing at least one item of acoustic data in said input string according to one or more rules to form at least one output string of acoustic data; and
  
  a phoneme speech engine for converting each formed output string of acoustic data to output speech data for output to at least one listener.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. A speech processing system according to claim 1 wherein said phoneme modification engine further comprises means for forming an intermediate string from the input string of acoustic data according to one or more rules associated with the speaker and means for forming at least one output string from the intermediate string according to one or more rules associated with at least one listener.
  - 3. A speech processing system according to claim 2 further comprising a grammar engine having means for receiving the intermediate string, means for statistically matching acoustic data in the intermediate string against a set of expected words, and means for making corrections in the intermediate string based on the results of the statistical matching.
  - 4. A speech processing system according to claim 3 further comprising a selection engine for sampling the speech data of each speaker and for selecting one or more rules based on the results of the sampling.
  - 5. A speech processing system according to claim 4 further comprising a rule set database for storing input and output rules associated with one or more classes of speakers and listeners.
  - 6. A speech processing system according to claim 5 further comprising a speech-to-text engine for performing speech-to-text conversion on speech data.

7. A method of processing speech comprising:
- receiving speech data from a speaker during a conversation turn in a conversation session;
  
  converting the received speech data to an input string of acoustic data;
  
  changing at least one item of acoustic data in said input string according to one or more rules to form at least one output string of acoustic data; and
  
  converting each formed output string of acoustic data to output speech data for output to at least one listener.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. A method of processing speech according to claim 7 further comprising:
    - forming an intermediate string from the input string of acoustic data according to one or more rules associated with the speaker; and
      
      forming at least one output string of acoustic data from the intermediate string according to one or more rules associated with at least one listener.
  - 9. A method of processing speech according to claim 8 further comprising:
    - receiving the intermediate string; and
      
      statistically matching acoustic data in the received intermediate string against a set of expected words; and
      
      making corrections in the intermediate string based on the results of the statistical matching.
  - 10. A method of processing speech according to claim 9 further comprising:
    - sampling the speech data for one or more speakers; and
      
      selecting one or more rules based on the results of the sampling.
  - 11. A method of processing speech according to claim 10 further comprising storing input and output rules associated with one or more classes of speakers and listeners in a rule set database.
  - 12. A method of processing speech according to claim 11 further comprising performing speech-to-text conversion of the output speech data.

13. A computer program product for processing speech comprising a computer usable medium having computer usable program code embodied therewith, said computer usable program code comprising:
- computer usable program code configured to receive the speech data from a speaker during a conversation turn in a conversation session;
  
  computer usable program code configured to convert the received speech data to an input string of acoustic data;
  
  computer usable program code configured to change at least one item of acoustic data in said input string according to one or more rules to form at least one output string of acoustic data; and
  
  computer usable program code configured to convert each formed output string of acoustic data to output speech data for output to at least one listener.
- View Dependent Claims (14, 15, 16, 17)
- - 14. A computer program product for processing speech according to claim 13 further comprising:
    - computer usable program code configured to form an intermediate string from the input string of acoustic data according to one or more rules associated with the speaker; and
      
      computer usable program code configured to form at least one output string of acoustic data from the intermediate string according to one or more rules associated with at least one listener.
  - 15. A computer program product for processing speech according to claim 14 further comprising:
    - computer usable program code configured to receive the intermediate string;
      
      computer usable program code configured to statistically match acoustic data in the received intermediate string against expected words; and
      
      computer usable program code configured to make corrections in the intermediate string based on the results of the statistical matching.
  - 16. A computer program product for processing speech according to claim 15 further comprising:
    - computer usable program code configured to sample the speech data for one or more speakers; and
      
      computer usable program code configured to select one or more rules based on the results of the sampling.
  - 17. A computer program product for processing speech according to claim 16 further comprising computer usable program code configured to store input and output rules associated with one or more classes of speakers and listeners in a rule set database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Gadd, Richard John, Janke, Eric William, Baker, David Robert, Barnard, Mark Richard

Granted Patent

US 8,027,836 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 19/0018 Speech coding using phoneti...

G10L 2015/025 Phonemes, fenemes or fenone...

PHONETIC DECODING AND CONCATENTIVE SPEECH SYNTHESIS

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

224 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

PHONETIC DECODING AND CONCATENTIVE SPEECH SYNTHESIS

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

224 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links