Phonetic decoding and concatentive speech synthesis
First Claim
1. A speech processing system for receiving speech data based on speech from a speaker during a conversation turn in a conversation session, said speech processing system comprising:
- a phoneme recognition engine configured to convert the received speech data to an input string of acoustic data using at least one processor;
a phoneme modification engine configured to change at least one item of acoustic data in said input string according to one or more rules to form at least one output string of acoustic data, wherein the one or more rules comprise a user rule associated with a user in the conversation session, and wherein the user is selected from the group consisting of the speaker and at least one listener; and
a phoneme speech engine configured to convert the at least one output string of acoustic data to output speech data for output to the at least one listener.
8 Assignments
0 Petitions
Accused Products
Abstract
A speech processing system includes a multiplexer that receives speech data input as part of a conversation turn in a conversation session between two or more users where one user is a speaker and each of the other users is a listener in each conversation turn. A speech recognizing engine converts the speech data to an input string of acoustic data while a speech modifier forms an output string based on the input string by changing an item of acoustic data according to a rule. The system also includes a phoneme speech engine for converting the first output string of acoustic data including modified and unmodified data to speech data for output via the multiplexer to listeners during the conversation turn.
300 Citations
20 Claims
-
1. A speech processing system for receiving speech data based on speech from a speaker during a conversation turn in a conversation session, said speech processing system comprising:
-
a phoneme recognition engine configured to convert the received speech data to an input string of acoustic data using at least one processor; a phoneme modification engine configured to change at least one item of acoustic data in said input string according to one or more rules to form at least one output string of acoustic data, wherein the one or more rules comprise a user rule associated with a user in the conversation session, and wherein the user is selected from the group consisting of the speaker and at least one listener; and a phoneme speech engine configured to convert the at least one output string of acoustic data to output speech data for output to the at least one listener. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of processing speech, the method comprising:
-
receiving speech data based on speech from a speaker during a conversation turn in a conversation session; converting the received speech data to an input string of acoustic data using at least one processor; changing at least one item of acoustic data in said input string according to one or more rules to form at least one output string of acoustic data, wherein the one or more rules comprise a user rule associated with a user in the conversation session, and wherein the user is selected from the group consisting of the speaker and at least one listener; and converting each formed output string of acoustic data to output speech data for output to the at least one listener. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer usable non-transitory storage medium storing computer usable program code that, when executed by a processor, performs a method comprising:
-
receiving speech data based on speech from a speaker during a conversation turn in a conversation session; converting the received speech data to an input string of acoustic data; changing at least one item of acoustic data in said input string according to one or more rules to form at least one output string of acoustic data, wherein the one or more rules comprise a user rule associated with a user in the conversation session, and wherein the user is selected from the group consisting of the speaker and at least one listener; and converting each formed output string of acoustic data to output speech data for output to the at least one listener. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification