INTELLIGENT HUMAN-MACHINE CONVERSATION FRAMEWORK WITH SPEECH-TO-TEXT AND TEXT-TO-SPEECH

US 20190164554A1
Filed: 11/30/2017
Published: 05/30/2019
Est. Priority Date: 11/30/2017
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a speech-to-text module to receive an input of speech including one or more words generated by a human and to output data including text, sentiment information, and other parameters information corresponding to the speech input;

a processing module to generate a reply to the speech input, the reply including a textual component, sentimental information associated with the textual component, and contextual information associated with the textual component; and

a text-to-speech module to receive the textual component, sentimental information, and contextual information of the reply and to generate, based on the received textual component and its associated sentimental information and contextual information of the reply, a speech output including one or more spoken words, the spoken words to be presented with at least one of a pace, a tone, a volume, an urgency, a rate, an accent pattern, and an emphasis representative of the sentimental information and contextual information associated with the textual component of the reply.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, computer-readable medium, and system including a speech-to-text module to receive an input of speech including one or more words generated by a human and to output data including text, sentiment information, and other parameters corresponding to the speech input; a processing module like Artificial Intelligence to generate a reply to the speech input, the reply including a textual component, sentimental information associated with the textual component, and contextual information associated with the textual component; and a text-to-speech module to receive the textual component, sentimental information, and contextual information and to generate, based on the received textual component and its associated sentimental information and contextual information, a speech output including one or more spoken words, the spoken words to be presented with at least one of a pace, a tone, a volume, and an emphasis representative of the sentimental information and contextual information associated with the textual component.

25 Citations

View as Search Results

18 Claims

1. A system comprising:
- a speech-to-text module to receive an input of speech including one or more words generated by a human and to output data including text, sentiment information, and other parameters information corresponding to the speech input;
  
  a processing module to generate a reply to the speech input, the reply including a textual component, sentimental information associated with the textual component, and contextual information associated with the textual component; and
  
  a text-to-speech module to receive the textual component, sentimental information, and contextual information of the reply and to generate, based on the received textual component and its associated sentimental information and contextual information of the reply, a speech output including one or more spoken words, the spoken words to be presented with at least one of a pace, a tone, a volume, an urgency, a rate, an accent pattern, and an emphasis representative of the sentimental information and contextual information associated with the textual component of the reply.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1, wherein the speech-to-text module is to further receive additional information to aid the speech-to-text module to accurately output data including the text, the sentiment information, and the other parameters information corresponding to the speech input, the additional information including at least one of an expected human generated response, a keyword, a probability based distribution, a knowledge of prior speeches, a knowledge of an on-going conversation, and combinations thereof.
  - 3. The system of claim 2, wherein the additional information to aid the speech-to-text module to accurately output data is received from the processing module.
  - 4. The system of claim 1, wherein the processing module receives information used thereby to generate the reply to the speech input from at least one of a database, one or more sensors, one or more controllers, and one or more actuators.
  - 5. The system of claim 1, wherein the textual component, the sentimental information associated with the textual component, and the contextual information associated with the textual component are synchronized to each other.
  - 6. The system of claim 5, wherein the processing module synchronizes the textual component, the sentimental information, and the contextual information to each other.
  - 7. The system of claim 1, wherein the at least one of the pace, the tone, the volume, the urgency, the rate, the accent pattern, and the emphasis of the speech output is determined on a word by word basis and a sentence by sentence basis for the speech output.
  - 8. The system of claim 1, wherein the processing module comprises an Artificial Intelligence processor.
  - 9. The system of claim 1, wherein the text-to-speech module can generate speech based on the received textual component in a plurality of different languages.

10. A computer-implemented method, the method comprising:
- receiving, by a processing module, speech input data derived from speech including one or more words generated by a human, the speech input data including text, sentiment information, and other parameters information corresponding to the speech;
  
  generating, by a processing module, a reply to the speech input data, the reply including a textual component, sentimental information associated with the textual component, and contextual information associated with the textual component; and
  
  transmitting, by a processing module, the textual component, sentimental information, and contextual information of the reply for the generation of, based on the textual component and its associated sentimental information and contextual information of the reply, a speech output including one or more spoken words, the spoken words to be presented with at least one of a pace, a tone, a volume, an urgency, a rate, an accent pattern, and an emphasis representative of the sentimental information and contextual information associated with the textual component of the reply.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The method of claim 10, wherein the processing module receives information used thereby to generate the reply to the speech input from at least one of a database, one or more sensors, one or more controllers, and one or more actuators.
  - 12. The method of claim 10, wherein the textual component, the sentimental information associated with the textual component, and the contextual information associated with the textual component are synchronized to each other.
  - 13. The method of claim 12, wherein the processing module synchronizes the textual component, the sentimental information, and the contextual information to each other.
  - 14. The method of claim 10, wherein the at least one of the pace, the tone, the volume, the urgency, the rate, the accent pattern, and the emphasis of the speech output is determined on a word by word basis and a sentence by sentence basis for the speech output.
  - 15. The method of claim 10, wherein the processing module comprises an Artificial Intelligence processor.

16. A non-transitory computer readable medium having processor-executable instructions stored thereon, the medium comprising:
- instructions to receive speech input data derived from speech including one or more words generated by a human, the speech input data including text, sentiment information, and other parameters information corresponding to the speech;
  
  instructions to generate a reply to the speech input data, the reply including a textual component, sentimental information associated with the textual component, and contextual information associated with the textual component; and
  
  instructions to transmit the textual component, sentimental information, and contextual information of the reply for the generation of, based on the textual component and its associated sentimental information and contextual information of the reply, a speech output including one or more spoken words, the spoken words to be presented with at least one of a pace, a tone, a volume, an urgency, a rate, an accent pattern, and an emphasis representative of the sentimental information and contextual information associated with the textual component of the reply.
- View Dependent Claims (17, 18)
- - 17. The medium of claim 16, wherein the textual component, the sentimental information associated with the textual component, and the contextual information associated with the textual component are synchronized to each other.
  - 18. The medium of claim 16, wherein the at least one of the pace, the tone, the volume, the urgency, the rate, the accent pattern, and the emphasis of the speech output is determined on a word by word basis and a sentence by sentence basis for the speech output.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
GE Digital Holdings LLC (GE Aerospace)
Original Assignee
General Electric Company
Inventors
HUANG, Ching-Ling, VENKATARAMANA, Raju, NISHIDA, Yoshifumi

Granted Patent

US 10,565,994 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/3329   Natural language query form...

G06F 16/60   of audio data

G06F 16/685   using automatically derived...

G06F 16/90332   Natural language query form...

G06N 5/04   Inference or reasoning models

G06N 7/01   Probabilistic graphical mod...

G10L 13/00   Speech synthesis; Text to s...

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/0335   Pitch control

G10L 13/04   Details of speech synthesis...

G10L 13/047   Architecture of speech synt...

G10L 13/10   Prosody rules derived from ...

G10L 15/22   Procedures used during a sp...

G10L 15/24   Speech recognition using no...

G10L 15/26   Speech to text systems G10L...

G10L 15/28   Constructional details of s...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/225   Feedback of the input speech

G10L 2015/226   using non-speech characteri...

G10L 2015/227   of the speaker; Human-fact...

G10L 2015/228 : of application context

View All

INTELLIGENT HUMAN-MACHINE CONVERSATION FRAMEWORK WITH SPEECH-TO-TEXT AND TEXT-TO-SPEECH

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

25 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

INTELLIGENT HUMAN-MACHINE CONVERSATION FRAMEWORK WITH SPEECH-TO-TEXT AND TEXT-TO-SPEECH

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

25 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links