Intelligent human-machine conversation framework with speech-to-text and text-to-speech
First Claim
1. A system comprising:
- a speech-to-text processor to receive an input of speech including one or more words generated by a human and to output data including text, sentiment information, and other parameters information corresponding to the speech input;
a processor to generate a reply to the speech input, the reply including a textual component, sentimental information associated with the textual component, and contextual information associated with the textual component; and
a text-to-speech processor to receive the textual component, sentimental information, and contextual information of the reply and to generate, based on the received textual component and its associated sentimental information and contextual information of the reply, a speech output including one or more spoken words, the spoken words to be presented with at least one of a pace, a tone, a volume, an urgency, a rate, an accent pattern, and an emphasis representative of the sentimental information and contextual information associated with the textual component of the reply, wherein the at least one of the pace, the tone, the volume, the urgency, the rate, the accent pattern, and the emphasis of the speech output is determined on a word by word basis and a sentence by sentence basis for the speech output.
2 Assignments
0 Petitions
Accused Products
Abstract
A method, computer-readable medium, and system including a speech-to-text module to receive an input of speech including one or more words generated by a human and to output data including text, sentiment information, and other parameters corresponding to the speech input; a processing module like Artificial Intelligence to generate a reply to the speech input, the reply including a textual component, sentimental information associated with the textual component, and contextual information associated with the textual component; and a text-to-speech module to receive the textual component, sentimental information, and contextual information and to generate, based on the received textual component and its associated sentimental information and contextual information, a speech output including one or more spoken words, the spoken words to be presented with at least one of a pace, a tone, a volume, and an emphasis representative of the sentimental information and contextual information associated with the textual component.
34 Citations
15 Claims
-
1. A system comprising:
-
a speech-to-text processor to receive an input of speech including one or more words generated by a human and to output data including text, sentiment information, and other parameters information corresponding to the speech input; a processor to generate a reply to the speech input, the reply including a textual component, sentimental information associated with the textual component, and contextual information associated with the textual component; and a text-to-speech processor to receive the textual component, sentimental information, and contextual information of the reply and to generate, based on the received textual component and its associated sentimental information and contextual information of the reply, a speech output including one or more spoken words, the spoken words to be presented with at least one of a pace, a tone, a volume, an urgency, a rate, an accent pattern, and an emphasis representative of the sentimental information and contextual information associated with the textual component of the reply, wherein the at least one of the pace, the tone, the volume, the urgency, the rate, the accent pattern, and the emphasis of the speech output is determined on a word by word basis and a sentence by sentence basis for the speech output. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method, the method comprising:
-
receiving, by a first processing module, speech input data derived from speech including one or more words generated by a human, the speech input data including text, sentiment information, and other parameters information corresponding to the speech; generating, by a second processing module, a reply to the speech input data, the reply including a textual component, sentimental information associated with the textual component, and contextual information associated with the textual component; and transmitting, by a third processing module, the textual component, sentimental information, and contextual information of the reply for the generation of, based on the textual component and its associated sentimental information and contextual information of the reply, a speech output including one or more spoken words, the spoken words to be presented with at least one of a pace, a tone, a volume, an urgency, a rate, an accent pattern, and an emphasis representative of the sentimental information and contextual information associated with the textual component of the reply, wherein the at least one of the pace, the tone, the volume, the urgency, the rate, the accent pattern, and the emphasis of the speech output is determined on a word by word basis and a sentence by sentence basis for the speech output. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A non-transitory computer readable medium having processor-executable instructions stored thereon, the medium comprising:
-
instructions to receive speech input data derived from speech including one or more words generated by a human, the speech input data including text, sentiment information, and other parameters information corresponding to the speech; instructions to generate a reply to the speech input data, the reply including a textual component, sentimental information associated with the textual component, and contextual information associated with the textual component; and instructions to transmit the textual component, sentimental information, and contextual information of the reply for the generation of, based on the textual component and its associated sentimental information and contextual information of the reply, a speech output including one or more spoken words, the spoken words to be presented with at least one of a pace, a tone, a volume, an urgency, a rate, an accent pattern, and an emphasis representative of the sentimental information and contextual information associated with the textual component of the reply wherein the at least one of the pace, the tone, the volume, the urgency, the rate, the accent pattern, and the emphasis of the speech output is determined on a word by word basis and a sentence by sentence basis for the speech output. - View Dependent Claims (15)
-
Specification